OCR with ABBYY – Common Issues Classified by Language

ABBYY FineReader PDF 15 OCR Editor does a great job at recognizing characters in many languages. It can take the brunt of heavy work out of converting scanned pdf to readable pdf files with good results. There are, however, some recognition issues that follow a pattern. These will likely be addressed in the upcoming versions of the program. When working with  ABBY FineReader PDF 15 OCR Editor in multiple languages it becomes obvious that in different languages a different set of characters pose issues.

Here is a compilation of errors that may creep into the recognized text, organized by language. This list is bound to grow in time.

Polish

ABBYY FineReader PDF 15 OCR Editor:

  1. converts the “0” digit into the letter “O”
  2. converts “w” to various characters that don’t make sense in context
  3. does not always recognize the dot “.” following an abbreviated word
  4. mistakenly recognizes ł and Ł for l and L and vice-versa