Repository logoRepository logo
 

NCHLT Optical Character Recognition for South African Languages

dc.contact.emailMartin.Puttkammer@nwu.ac.za
dc.contact.nameMartin Puttkammer
dc.contributor.authorMartin Puttkammer
dc.contributor.authorJustin Hocking
dc.contributor.authorRoald Eiselen
dc.date.accessioned2018-02-05T20:22:45Z
dc.date.accessioned2018-03-05T17:46:33Z
dc.date.available2018-02-05T20:22:45Z
dc.date.available2018-03-05T17:46:33Z
dc.date.issued2017-02-23
dc.descriptionAn OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure of document image and divides the page into elements such as blocks of texts, tables and images. These blocks are used to identify character image patterns which are used to advance several hypotheses about the character possibilities. These hypotheses are used to produce different character, word and line level variations and associated probabilities. The set of probability hypotheses are then searched to find the most likely combination of characters, words and lines to produce a textual representation of the image.
dc.format.mediumUTF8
dc.identifier.citationHocking, J. and Puttkammer, M., 2016, November. Optical character recognition for South African languages. In Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), 2016 (pp. 1-5). IEEE.
dc.identifier.urihttps://hdl.handle.net/20.500.12185/322
dc.language.isoafr
dc.language.isoeng
dc.language.isonbl
dc.language.isoxho
dc.language.isozul
dc.language.isosot
dc.language.isonso
dc.language.isotsn
dc.language.isossw
dc.language.isoven
dc.language.isotso
dc.languagesAfrikaans
dc.languagesEnglish
dc.languagesisiNdebele
dc.languagesisiXhosa
dc.languagesisiZulu
dc.languagesSesotho sa Leboa (Sepedi)
dc.languagesSetswana
dc.languagesSesotho
dc.languagesSiswati
dc.languagesTshivenda
dc.languagesXitsonga
dc.media.typeText
dc.projectNCHLT Text III
dc.publisherNorth-West University
dc.publisherCentre for Text Technology (CTexT)
dc.rights.licenseCreative Commons Attribution 3.0 Unported License (CC BY 3.0): https://creativecommons.org/licenses/by/3.0/za/
dc.software.requirementsTesseract-OCR
dc.titleNCHLT Optical Character Recognition for South African Languages
dc.typeTools
dc.version1.0.
local.collection.primaryResource Catalogue
local.collection.secondaryResource Index

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
nchlt_optical_character_recognition.zip
Size:
103.81 MB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.