Zum Inhalt springen
YouTube page opens in new windowFacebook page opens in new windowLinkedin page opens in new windowInstagram page opens in new windowRSS page opens in new window
European Digital Innovation Hub Saarland
European Digital Innovation Hub SaarlandEuropean Digital Innovation Hub Saarland
  • Dates & Events
  • offers
    • trainings
    • Funding programmes
    • Best practice
    • Webinar
    • Technology radar
  • News
  • About us
  • en_GBEnglish (UK)
    • de_DEDeutsch
    • fr_FRFrançais
  • Dates & Events
  • offers
    • trainings
    • Funding programmes
    • Best practice
    • Webinar
    • Technology radar
  • News
  • About us
  • en_GBEnglish (UK)
    • de_DEDeutsch
    • fr_FRFrançais

Optical Character Recognition (OCR)

The transformation from analogue to digital!

OCR or Optical Character Recognition describes a technology that can be created from grid/image files such as PDFs, JPGs, etc. The transformation allows the data obtained to be used to further process or automate processes, such as invoice processing.

Prototypes and demonstrators available
Cross-industry deployment
Suitable for SMEs?

State of play

OCR is a basic technology for many programmes dealing with document processing. This technology allows paper documents to be digitised and processed. As a result, analogue transmission is not carried out by dipping out relevant documents. Paper documents made exclusively of machine writing can already be fully digitised with a very high level of reliability. OCR is used among others as a basis for document management systems to enable further processing and automation of paper documents, such as invoices or delivery notes.

How does OCR work?

OCR technology is mainly characterised by pattern recognition. It thus classifies the image file into different categories. Figures, tables and blocks of text are distinguished here. Once the programme has identified a text block, the next step will be to analyse the individual letters. For the computer, these letters are simply groups of pixels with which it is not yet able to work. The programme identifies these groupings (OCRs) and compares them with an existing database and converts them into characters. In order to improve the detection of handwritten texts, the groupings are also cross-checked at character level (ICR) and word level (IMR) from another database in the next step.

Smart Character Recognition

Intelligent Character Recognition (ICR) describes the error correction at character level. Thus, it is checked whether the sign identified makes sense in the context of the word. Examples could be the easily confused characters ‘O’ and ‘0’ (zero) or ‘B’ and ‘8’. The figure 0 could be detected in the word “Oder” and replaced by ICR with the correct letter “O”.

Intelligent Word Recognition

Intelligent Word Recognition (IWR) describes the correction of errors at word level. Especially in the case of handwritten flowers, individual characters may not be recognisable by OCR technology. For example, IWR compares global characteristics of the word with a word database to increase the correct detection rate.

Gradual introduction

Step 1: Technical infrastructure

OCR technology can create a digital text file from a paper piece. However, a grid/image file from the document must be created beforehand. This is done by means of a scanner. Scanners can be purchased as stand-alone devices, but modern printers are often already equipped with a scanner. This requires a computer and a storage medium on which the file can be stored. This can be either the hard disk of the computer or an online cloud memory.

Step 2: Sort paper documents

In order to allow for efficient processing, the documents to be digitised need to be sorted meaningfully. Attention should be paid here to the next step in order to facilitate further work.

Step 3: Scan documents

In this step, the scanner is used to scan the document and create a grid/image file that is stored locally on the calculator.

Step four: Software-supported file conversion

If the image file is generated, the file can be converted. This requires one of many OCR tools, which can be edited by well-known writing programs using a few clicks from the grid file to create a text file in known formats such as.docx or.txt.

Opportunities for SMEs

Conversion of analogue media to digital

First step towards automation of processes

Simplified document management

Avoidance of transcription errors

contact

Do you need support in setting up your business?

Contact us!

Use our technology radar to keep a look at the main technologies relevant to SMEs!

Back to technology radar

Optical Character Recognition (OCR)

The transformation from analog to digital!

OCR, or Optical Character Recognition, describing a technology that can create text files from raster/image files such as PDFs, JPGs, etc. can create text files. The conversion makes it possible to use the data obtained for further processing or to automate processes such as invoice processing.

Prototypes and demonstrators available
Cross-industry deployment
Suitable for SMEs?

The current status

OCR is a basic technology for many programs that deal with the processing of documents. This technology makes it possible to digitise and prepare documents that are available in paper form. This eliminates the need for analog transmission by typing out relevant documents. Pieces of paper consisting exclusively of machine print can already be completely digitised with a very high degree of reliability. OCR is used, among other things, as the basis for document management systems to enable further processing and automation of paper documents, such as invoices or delivery bills.

How does OCR work?

OCR technology is characterised above all by pattern recognition. It divides the image file into different categories. A distinction is made here between figures, tables and text blocks. Once the program has identified a text block, the next step is to analyse the individual letters. For the computer, these letters are mere groupings of pixels that it cannot yet work with. The program recognises these groupings (OCR) and compares them with an existing database and converts them into characters. In order to improve the recognition of handwritten texts, the groupings are therefore compared at character level (ICR) and word level (IWR) from another database in the next step.

Smart Character Recognition

Intelligent Character Recognition (ICR) described error correction at character level. These checks whether the recognised character makes sense in the context of the word. Examples of this would be the easily confused characters ‘O’ and ‘0’ (zero) or ‘B’ and ‘8’. The number 0 could be recognised in the word ‘Or’ and replaced by ICR with the correct letter ‘O’.

Intelligent Word Recognition

Intelligent Word Recognition (IWR) described error correction at word level. Particularly with handwritten continuous text, it can happen that the individual characters cannot be recognised by the OCR technology. IWR compares global characteristics of the word with a word database in order to increase the correct recognition rate.

Step-by-step introduction

STEP1 Technical infrastructure

OCR technology can create a digital text file from a piece of paper. However, a raster or image file must first be created from the document. This is done using a scanner. Scanners can be purchased as stand-alone devices, but modern printers are often already equipped with a scanner. This requires a computer and a storage medium on which the file can be saved. This can either be the computer’s hard disk or an online cloud storage.

Step 2: SORT paper documents

To enable efficient processing, the documents to be digitised must be sorted sensibly. Here, emphasis should be placed on the subsequent work step in order to make further work easier.

Step 3: Scan documents

In this step, the scanner is used to scan the document and create a raster/image file that is saved locally on the computer.

Step 4: Software-assisted conversion of the files

Once the image file has been created, the file can be converted. This requires one of many OCR tools that can create a text file from the raster file in familiar formats such as.docx or.txt with just a few clicks and can therefore be processed by familiar writing programs.

Opportunities for SMEs

Conversion from analog media to digital

First step towards process automation

Simplified document management

Avoidance of transmission errors

Contact us

Do you need support with the introduction in your company?

Get in touch with us!

Keep an eye on the most important SME-relevant technologies with our technology radar!

Back to the technology radar
European Digital Innovation Hub Saarland
  • address
    c/o ZeMA, Eschberger Weg 46, D-66121 Saarbrücken
  • telephones
    +49 (0) 681 85787 – 300
  • E-mail
    info@edih-saarland.de

The European Digital Innovation Hub Saarland (EDIH Saarland) will have up to 50% funded by EU funds (GA 101083337) and by the Saarland Ministry of Economic Affairs, Innovation, Digital and Energy. The EDIH Saarland offers SMEs in the region a free one-stop shop for the digitization and application of artificial intelligence (AI). Over the next three years (2023-2025), significant expertise will be provided for the practical transfer of industrial AI in Saarland, the Greater Region (Saar-Lor-Lux) and Europe.

The ZeMA is in charge here, in addition to the participating project partners AWSi, DFKI, saaris and East Side Fab.

European Digital Innovation Hub Saarland
  • Disclaimer
  • privacy
  • Change privacy settings
  • History of privacy settings
  • Withdrawal of consent
Legal Notice & Cookie settings

© European Digital Innovation Hub Saarland

Go to Top
Cookie Consent mit Real Cookie Banner