FAQ - What does OCR stand for? : Product Support

OCR stands for Optical Character Recognition and determines whether or not a document is text searchable.

All documents containing text need to be OCR'd in order for them to be text searchable on Opus 2 - indeed, the platform is dependent on the OCR data within the PDF file for some features.

For example, this data enables a user to accurately search all content uploaded for specific words / phrases. It is worth noting that the accuracy of the OCR process is subject to the quality of the document being processed, and it is not uncommon for the process to give rise to errors such as the letter "O" being mistaken for the number "0", or the letter "l" being mistaken for the number "1".

As a general rule, it is better to produce a document in PDF from the original format where possible. For example, if a document is available both as an un-OCR'd PDF and as a Microsoft Word document, it is better to save the Word document as a PDF form within Word, rather than OCRing the PDF.

This is because OCRing the PDF may give rise to errors like the above. Whereas saving the Word file as a PDF will ensure perfect accuracy of the OCR data when it becomes a PDF, as Word will simply save the characters as typed when it makes the PDF.

Product Support | UK

FAQ - What does OCR stand for? Print

FAQ - What does OCR stand for? Print

Related Articles