15–16 Sept 2022
Europe/Bucharest timezone

Complete OCR Solution for Image Analysis of World War 2 Documents

16 Sept 2022, 14:20
20m
Paper presentation Grid, Cloud & High Performance Computing in Science Session 3A - Grid, Cloud & High Performance Computing

Speaker

Mr Nicolae Tarbă (University Politehnica of Bucharest)

Description

The field of Optical Character Recognition (OCR) consists of techniques that are mainly focused on document image analysis. Aside from generating significant speedups of everyday procedures, OCR has a considerable role in the preservation of historical sources of information. Most of the World War 2 (WW2) documents are of great importance, especially with applications in virtual archives, museums, and research. The situation asks for an efficient, yet not aggressive, transcribing method using OCR tools. This paper describes an approach in the context of the given problem. The focus is oriented towards extracting the information from documents affected by their age, but with simpler structures, mainly split into paragraphs, such as letters and military reports. The approach is based on combining the results of multiple OCR engines, with the final objective of achieving better performance compared to the individual performance of each engine.

Authors

Mr Mihai Grădinaru (University Politehnica of Bucharest) Mr Andrei Negru (University Politehnica of Bucharest) Prof. Costin-Anton Boiangiu (University Politehnica of Bucharest) Mr Nicolae Tarbă (University Politehnica of Bucharest) Mr Mihai Voncilă (University Politehnica of Bucharest) Prof. Răzvan-Adrian Deaconescu (University Politehnica of Bucharest)

Presentation materials

There are no materials yet.