Optical Character Recognition (OCR)
The transformation from analogue to digital!
OCR or Optical Character Recognition describes a technology that can be created from grid/image files such as PDFs, JPGs, etc. The transformation allows the data obtained to be used to further process or automate processes, such as invoice processing.
State of play
OCR is a basic technology for many programmes dealing with document processing. This technology allows paper documents to be digitised and processed. As a result, analogue transmission is not carried out by dipping out relevant documents. Paper documents made exclusively of machine writing can already be fully digitised with a very high level of reliability. OCR is used among others as a basis for document management systems to enable further processing and automation of paper documents, such as invoices or delivery notes.
How does OCR work?
OCR technology is mainly characterised by pattern recognition. It thus classifies the image file into different categories. Figures, tables and blocks of text are distinguished here. Once the programme has identified a text block, the next step will be to analyse the individual letters. For the computer, these letters are simply groups of pixels with which it is not yet able to work. The programme identifies these groupings (OCRs) and compares them with an existing database and converts them into characters. In order to improve the detection of handwritten texts, the groupings are also cross-checked at character level (ICR) and word level (IMR) from another database in the next step.
Smart Character Recognition
Intelligent Character Recognition (ICR) describes the error correction at character level. Thus, it is checked whether the sign identified makes sense in the context of the word. Examples could be the easily confused characters ‘O’ and ‘0’ (zero) or ‘B’ and ‘8’. The figure 0 could be detected in the word “Oder” and replaced by ICR with the correct letter “O”.
Intelligent Word Recognition
Intelligent Word Recognition (IWR) describes the correction of errors at word level. Especially in the case of handwritten flowers, individual characters may not be recognisable by OCR technology. For example, IWR compares global characteristics of the word with a word database to increase the correct detection rate.
Gradual introduction
Step 1: Technical infrastructure
Step 2: Sort paper documents
Step 3: Scan documents
Step four: Software-supported file conversion
Opportunities for SMEs
contact
Use our technology radar to keep a look at the main technologies relevant to SMEs!