You are here : home > capture 101 > recognize > oCR

OCR Software

Recognition for Machine Printed Characters

Optical Character Recognition (OCR) engines are designed to read typed information, generically know as "machine printed" characters.

Types of OCR Engines


The most common, and most powerful type of OCR is the so-called "omni-font" engine. As its name suggests, an omni-font engine can read any font, whether it is a traditional serif-ed font like Times Roman, a simple san-serif like Helvetica (Arial), or one of the more stylized fonts commonly available on your desktop PC. Strangely, omni-font engines generally don't do so well on fonts that are designed specifically for recognition, such as OCR-A. That's because those fonts have unusual peculiarities that set them apart from more standard fonts. So some OCR (and ICR) engines are trained specifically to read fonts such as OCR, OCR-B, Farrington 7B, CMC7, and MICR (as on checks).

What Are The Limits?


An OCR engine, even an omni-font engine, can't read everything. They usually struggle to read characters that are smaller than 8 points. Likewise, very large type, over 24 points will be ignored by the OCR engine. The biggest barrier to good recognition, however, is the quality of the type. Unfortunately, everything is not printed on high-resolution laser printers (with full toner cartridges!). There is still a lot of printing on dot matrix printers (with faded ribbons!). When the dots of the characters don't quite touch or the type is very faint, then expect OCR accuracy to fade with it.

Is Higher Resolution Required?


Most document scanning is done at 200 dots per inch (dpi). Most OCR recognition engines have been tuned to this reality. That helps explain why raising the resolution does not always translate into an instant improvement in accuracy. The best way to find out if your application needs higher resolution is to give it a try.

back to top | request more information | contact Datacap | site map