NCHLT Optical Character Recognition for South African Languages

Martin Puttkammer; Justin Hocking; Roald Eiselen

Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/322'

NCHLT Optical Character Recognition for South African Languages

Files

nchlt_optical_character_recognition.zip (103.81 MB)

Date

2017-02-23

Authors

Martin Puttkammer

Justin Hocking

Roald Eiselen

Publisher

North-West University
Centre for Text Technology (CTexT)

Description

An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure of document image and divides the page into elements such as blocks of texts, tables and images. These blocks are used to identify character image patterns which are used to advance several hypotheses about the character possibilities. These hypotheses are used to produce different character, word and line level variations and associated probabilities. The set of probability hypotheses are then searched to find the most likely combination of characters, words and lines to produce a textual representation of the image.

Citation

Hocking, J. and Puttkammer, M., 2016, November. Optical character recognition for South African languages. In Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), 2016 (pp. 1-5). IEEE.

License

Creative Commons Attribution 3.0 Unported License (CC BY 3.0)

URI

https://hdl.handle.net/20.500.12185/322

Collections

Resource Catalogue
Resource Index

Verification status

Level 0

Full item page

NCHLT Optical Character Recognition for South African Languages

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

License

URI

Collections

Verification status