Search
Now showing items 1-10 of 100
Format Normaliser 1.0.
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
Normalises input files to txt, utf8, replaces smart quotes with straight quotes, removes empty lines, etc.
NCHLT Optical Character Recognition for South African Languages
(North-West University; Centre for Text Technology (CTexT), 2017-02-23) ~ - Resource Catalogue
An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure ...
Lwazi isiZulu TTS corpus
(Meraka Institute, CSIR, 2013-03-27) ~ - Resource Catalogue
Orthographic and phonemically aligned transcriptions
African Speech Technology isiZulu Text Corpus
(North-West University; Stellenbosch University; University of Transkei; University of Free State (Qwa-Qwa campus); Rhodes University; University of KwaZulu-Natal; University of Western Cape, 2015-01-07) ~ - Resource Catalogue
Monolingual text corpus developed during the African Speech Technology project.
Autshumato English-isiZulu Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Aligned parallel corpora for the language pair English-isiZulu. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
NCHLT Tagger
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
A graphical user interface and command line tool to automatically annotate running text with one or more linguistic tags:\n* Part of Speech\n* Named ...
NCHLT isiZulu Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
Phonetic aligner
(Meraka Institute, CSIR, 2013-07-01) ~ - Resource Index
Scripts for automatic phonetic alignment of speech corpora using hidden markov models (HMMs).
GNApp (VSep2009)
(Meraka Institute, CSIR, 2013-07-01) ~ - Resource Index
An augmentative and alternate communication (AAC) device which generates synthesised (or pre-recorded) speech as output based on icons. Available as a ...
South African Fonts
(Translate.org.za, 2015-01-28) ~ - Resource Index
The South African fonts collection is a set of open fonts that cover all characters needed by all 11 South African languages. The fonts ensure that all ...