2006/04/19 | PDF Reader
类别(Programme) | 评论(0) | 阅读(154) | 发表于 14:22

Simple iText

Reading PDF
PdfReader
You can't 'parse' an existing PDF file using iText, you can only 'read' it page per page.
What does this mean?
The pdf format is just a canvas where text and graphics are placed without any structure information. As such there aren't any 'iText-objects' in a PDF file. In each page there will probably be a number of 'Strings', but you can't reconstruct a phrase or a paragraph using these strings. There are probably a number of lines drawn, but you can't retrieve a Table-object based on these lines. In short: parsing the content of a PDF-file is NOT POSSIBLE with iText (not if you want good results: there are ways to retrieve text from an existing PDF). Post your question on the newsgroup news://comp.text.pdf and maybe you will get some answers from people that have built tools that can parse PDF and extract some of its contents, but don't expect tools that will perform a bullet-proof conversion to structured text.
What iText DOES provide is the possibility to READ a PDF document and copy an entire page of this file into the PDF file you are constructing from scratch. This can be useful if you want to create a new document based on (an) existing document(s). You can add a Watermark, pagenumbers,...
Chap13_pdfreader takes a pdf file from Chapter 7 and creates a new document where 4 pages of the original document are painted on 1 page of the new document. We also added a Watermark and pagenumbers (see Chap13_pdfreader.pdf). In order to fully understand the code (an how to adapt it to your needs, you will have to read Chapter 10 first)
If you have an existing PDF file that represents a form, you could copy the pages of this form and paint text at precise locations on this form. You can't edit an existing PDF document, by saying: for instance replace the word Louagie by Lowagie. To achieve this, you would have to know the exact location of the word Louagie, paint a white rectangle over it and paint the word Lowagie on this white rectangle. Please avoid this kind of 'patch' work. Do your PDF editing with an Adobe product.

com.lowagie.tools.*
In package com.lowagie.tools, there are 4 little tools that can be called from the command line:
  • com.lowagie.tools.concat_pdf
    This class can be used from the commandline to concatenate existing PDF files.
    arguments: the filenames of the PDF documents you want to concatenate, followed by the filename of the destination file.
    Command line example:
    java -cp itext.jar com.lowagie.tools.concat_pdf Chap0101.pdf Chap0102.pdf Chap0103.pdf result.pdf
    result.pdf contains the three first examples from Chapter 1.
  • com.lowagie.tools.split_pdf
    This class can be used from the commandline to split an existing PDF file into two new files.
    Remark: some information from the original file (for instance annotations) will get lost in the process!
    arguments: srcfile destfile1 destfile2 pagenumber
    Command line example:
    java -cp itext.jar com.lowagie.tools.split_pdf result.pdf result1.pdf result2.pdf 2
    result.pdf will be split into a one page document result1.pdf and a 2 page document result2.pdf (result2.pdf starts with the second page of result.pdf).
  • com.lowagie.tools.handout_pdf
    This class can be used from the commandline to make handouts from an existing PDF file. You can choose the number of slides per page.
    arguments: srcfile destfile pages
    Command line example:
    java -cp itext.jar com.lowagie.tools.handout_pdf concat.pdf handout.pdf 2
    handout.pdf is a two page overview (2 slides per page) of the three page document result.pdf.
  • com.lowagie.tools.encrypt_pdf
    This class can be used from the commandline to encrypt a PDF file.
    arguments: input_file output_file user_password owner_password permissions 128|40
    permissions is 8 digit long 0 or 1. Each digit has a particular security function:
    1. AllowPrinting
    2. AllowModifyContents
    3. AllowCopy
    4. AllowModifyAnnotations
    5. AllowFillIn (128 bit only)
    6. AllowScreenReaders (128 bit only)
    7. AllowAssembly (128 bit only)
    8. AllowDegradedPrinting (128 bit only)
    Example permissions to copy and print would be: 10100000
    Command line example:
    java -cp itext.jar com.lowagie.tools.encrypt_pdf Chap0101.pdf encrypted.pdf user master 00000000 128
    You will only be able to open the file encrypted.pdf if you know the password (= user). You won't be able to print the file, modify the contents, copy parts of it,...



0

评论Comments

日志分类
首页[50]
MyBlog[11]
WebService[3]
Programme[19]
Pastime[17]