Search
Now showing items 1-10 of 153
Format Normaliser 1.0.
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
Normalises input files to txt, utf8, replaces smart quotes with straight quotes, removes empty lines, etc.
NCHLT Optical Character Recognition for South African Languages
(North-West University; Centre for Text Technology (CTexT), 2017-02-23) ~ - Resource Catalogue
An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure ...
Lwazi II Proper Name Call Routing Telephone Corpus
(Meraka Institute, CSIR; North-West University, 2015-11-20) ~ - Resource Catalogue
Short prompts of proper names and language names collected via the telephone network.
Autshumato English-isiZulu Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Aligned parallel corpora for the language pair English-isiZulu. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
Autshumato English-Xitsonga Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Aligned parallel corpora for the language pair English-Xitsonga. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
UNISA Multilingual Corpus
(University of South Africa, 2018-02-28) ~ - Resource Index
The resource comprises a diverse selection of TEI P5 marked up documents from institutional origin, written in Tswana, Afrikaans and English. This is ...
Phonetic aligner
(Meraka Institute, CSIR, 2013-07-01) ~ - Resource Index
Scripts for automatic phonetic alignment of speech corpora using hidden markov models (HMMs).
GNApp (VSep2009)
(Meraka Institute, CSIR, 2013-07-01) ~ - Resource Index
An augmentative and alternate communication (AAC) device which generates synthesised (or pre-recorded) speech as output based on icons. Available as a ...
South African Fonts
(Translate.org.za, 2015-01-28) ~ - Resource Index
The South African fonts collection is a set of open fonts that cover all characters needed by all 11 South African languages. The fonts ensure that all ...
NCHLT South African Language Identifier
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
A graphical user interface and command line tool to automatically classify a document, paragraph, sentence or phrase as one of the eleven official South ...