Optical Character Recognition (OCR)
The transformation from analog to digital!
OCR, or Optical Character Recognition, describes a technology that can create text files from raster / image files such as PDFs, JPGs, etc. can create text files. The conversion makes it possible to use the data obtained for further processing or to automate processes such as invoice processing.
The current status
OCR is a basic technology for many programs that deal with the processing of documents. This technology makes it possible to digitize and prepare documents that are available in paper form. This eliminates the need for analog transmission by typing out relevant documents. Pieces of paper consisting exclusively of machine print can already be completely digitized with a very high degree of reliability. OCR is used, among other things, as the basis for document management systems to enable further processing and automation of paper documents, such as invoices or delivery bills.
How does OCR work?
OCR technology is characterized above all by pattern recognition. It divides the image file into different categories. A distinction is made here between figures, tables and text blocks. Once the program has identified a text block, the next step is to analyze the individual letters. For the computer, these letters are merely groupings of pixels that it cannot yet work with. The program recognizes these groupings (OCR) and compares them with an existing database and converts them into characters. In order to improve the recognition of handwritten texts, the groupings are also compared at character level (ICR) and word level (IWR) from another database in the next step.
Intelligent Character Recognition
Intelligent Character Recognition (ICR) describes error correction at character level. This checks whether the recognized character makes sense in the context of the word. Examples of this would be the easily confused characters “O” and “0” (zero) or “B” and “8”. The number 0 could be recognized in the word “Or” and replaced by ICR with the correct letter “O”.
Intelligent Word Recognition
Intelligent Word Recognition (IWR) describes error correction at word level. Particularly with handwritten continuous text, it can happen that the individual characters cannot be recognized by the OCR technology. IWR compares global characteristics of the word with a word database in order to increase the correct recognition rate.
Step-by-step introduction
Step 1: Technical infrastructure
Step 2: Sort paper documents
Step 3: Scan documents
Step 4: Software-assisted conversion of the files
Opportunities for SMEs
Contact us
Keep an eye on the most important SME-relevant technologies with our technology radar!