|
PDF2XL Enterprise Online Help
Table of Contents Concepts Document Modes Scanned Document Mode
Scanned Document Mode
Extracting data from scanned documents, scanned PDF documents or image files requires the use of technology called OCR (Optical Character Recognition). In case of Scanned PDF documents, when the users marks a table for extraction and PDF2XL Enterprise identifies that the document is a scanned PDF document, PDF2XL Enterprise will notify them of that fact (as shown below):

Once the user presses OK, the PDF2XL Enterprise's OCR module will run the recognition process and shift to scanned PDF document mode, in which the words which the OCR process is not sure about will be underlined (as shown below):

In case of scanned documents or image files, the OCR process will be performed automatically when the scan is over or when the image is opened.
To correctly convert the data from the scanned document, the user should:
- To continue to define the table boundaries, column locations, row and cell locations until satisfied with the page layout definition
- Before converting, click on the Validate Data button
, which will allow the user to inspect the OCR recognition result and, if necessary, fix them
- Press the Convert button

While in the Scanned Document Mode, right-clicking the document will display the Scanned Document Context Menu (except while in the Scanned Document Validation Mode).
Note that if the user rotates the page, the OCR process will be run again.
The user can request a re-run of the OCR process on the whole page, the table layout, or the current selection using the OCR Menu or the context menu.
|