PDF2XL OCR allows you to specify a format for each field or table column. This format can be used when converting a document, or when performing OCR on this field or column (in Scanned Document Mode).
The format of a field or column has the following effects on its contents:
Selecting a format which doesn't fit the actual data in the field or column will be ignored by PDF2XL OCR. For example, if a field says "12/12/12" and you select the Currency format, PDF2XL OCR will recognize that it doesn't look like a currency and will keep treating it as a date.
- PDF2XL OCR will remove any characters from the field or column that aren't part of the selected format's character set. For example, if a field is set to use a numeric format, PDF2XL OCR will remove all letters from that field in the conversion. The character sets for each format can be edited in the Format section of the options dialog (accessible from the Start menu).
- When converting to Excel, the display format of each cell will be the same as the selected conversion format.
- When performing OCR, the OCR engine will limit the possible outcomes to the appropriate character set for the selected format.
Additional points that might be useful for you to know:
- Text fields/ columns will usually retain their indentation and line wrapping. However, if you're converting into the clipboard or a CSV file, line wrapping will be ignored.
- The Image format is not available for predefined fields.
- The Number format's default setting is to display 4 decimal digits.
- The Currency format's default setting is to display a $ sign before the number and only 2 decimal digits. It will also color negative numbers red.
- The Date format's default setting is to display month/day/year.
- The Time format's default setting is to display hour:minute.
- Since the format affects the OCR results, it is recommended to perform OCR again after changing a column's format while in Scanned Document Mode. Don't worry though, this will only perform the OCR again on the selected field or column, and not the whole document.