PDF to Excel

PDF to Excel OCR

PDF to Excel Ent.

PDF to Excel CLI

Update

PDF2XL

PDF2XL OCR

PDF2XL Ent.

PDF2XL CLI

Checkout

Upgrade

Corporate

Update

Contact us

Press Room

Privacy policy

Legal notice

PDF2XL Enterprise Online Help

Table of Contents Concepts Document Modes Scanned Document Mode

Scanned Document Mode

Extracting data from scanned documents, scanned PDF documents or image files requires the use of technology called OCR (Optical Character Recognition). In case of Scanned PDF documents, when the users marks a table for extraction and PDF2XL Enterprise identifies that the document is a scanned PDF document, PDF2XL Enterprise will notify them of that fact (as shown below):

Once the user presses OK, the PDF2XL Enterprise's OCR module will run the recognition process and shift to scanned PDF document mode, in which the words which the OCR process is not sure about will be underlined (as shown below):

In case of scanned documents or image files, the OCR process will be performed automatically when the scan is over or when the image is opened.

To correctly convert the data from the scanned document, the user should:

  1. To continue to define the table boundaries, column locations, row and cell locations until satisfied with the page layout definition
  2. Before converting, click on the Validate Data button , which will allow the user to inspect the OCR recognition result and, if necessary, fix them
  3. Press the Convert button

While in the Scanned Document Mode, right-clicking the document will display the Scanned Document Context Menu (except while in the Scanned Document Validation Mode).

Note that if the user rotates the page, the OCR process will be run again.

The user can request a re-run of the OCR process on the whole page, the table layout, or the current selection using the OCR Menu or the context menu.


Additional Site Links:

Important PDF and Excel sites:

2009 Cogniview Ltd. All rights reserved