NCHLT Optical Character Recognition for South African Languages
Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/322'
dc.contact.email | Martin.Puttkammer@nwu.ac.za | |
dc.contact.name | Martin Puttkammer | |
dc.contributor.author | Martin Puttkammer | |
dc.contributor.author | Justin Hocking | |
dc.contributor.author | Roald Eiselen | |
dc.date.accessioned | 2018-02-05T20:22:45Z | |
dc.date.accessioned | 2018-03-05T17:46:33Z | |
dc.date.available | 2018-02-05T20:22:45Z | |
dc.date.available | 2018-03-05T17:46:33Z | |
dc.date.issued | 2017-02-23 | |
dc.description | An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure of document image and divides the page into elements such as blocks of texts, tables and images. These blocks are used to identify character image patterns which are used to advance several hypotheses about the character possibilities. These hypotheses are used to produce different character, word and line level variations and associated probabilities. The set of probability hypotheses are then searched to find the most likely combination of characters, words and lines to produce a textual representation of the image. | |
dc.format.medium | UTF8 | |
dc.identifier.citation | Hocking, J. and Puttkammer, M., 2016, November. Optical character recognition for South African languages. In Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), 2016 (pp. 1-5). IEEE. | |
dc.identifier.uri | https://hdl.handle.net/20.500.12185/322 | |
dc.language.iso | afr | |
dc.language.iso | eng | |
dc.language.iso | nbl | |
dc.language.iso | xho | |
dc.language.iso | zul | |
dc.language.iso | sot | |
dc.language.iso | nso | |
dc.language.iso | tsn | |
dc.language.iso | ssw | |
dc.language.iso | ven | |
dc.language.iso | tso | |
dc.languages | Afrikaans | |
dc.languages | English | |
dc.languages | isiNdebele | |
dc.languages | isiXhosa | |
dc.languages | isiZulu | |
dc.languages | Sesotho sa Leboa (Sepedi) | |
dc.languages | Setswana | |
dc.languages | Sesotho | |
dc.languages | Siswati | |
dc.languages | Tshivenda | |
dc.languages | Xitsonga | |
dc.media.type | Text | |
dc.project | NCHLT Text III | |
dc.publisher | North-West University | |
dc.publisher | Centre for Text Technology (CTexT) | |
dc.rights.license | Creative Commons Attribution 3.0 Unported License (CC BY 3.0): https://creativecommons.org/licenses/by/3.0/za/ | |
dc.software.requirements | Tesseract-OCR | |
dc.title | NCHLT Optical Character Recognition for South African Languages | |
dc.type | Tools | |
dc.version | 1.0. | |
local.collection.primary | Resource Catalogue | |
local.collection.secondary | Resource Index |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- nchlt_optical_character_recognition.zip
- Size:
- 103.81 MB
- Format:
- ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.