Search
Now showing items 1-10 of 82
Format Normaliser 1.0.
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
Normalises input files to txt, utf8, replaces smart quotes with straight quotes, removes empty lines, etc.
NCHLT Tshivenda Speech Corpus
(Meraka Institute, CSIR; North-West University, 2014-07-08) ~ - Resource Catalogue
Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
NCHLT Optical Character Recognition for South African Languages
(North-West University; Centre for Text Technology (CTexT), 2017-02-23) ~ - Resource Catalogue
An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure ...
NCHLT Tshivenda Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
NCHLT Tagger
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
A graphical user interface and command line tool to automatically annotate running text with one or more linguistic tags:\n* Part of Speech\n* Named ...
Phonetic aligner
(Meraka Institute, CSIR, 2013-07-01) ~ - Resource Index
Scripts for automatic phonetic alignment of speech corpora using hidden markov models (HMMs).
NCHLT Tshivenda Auxiliary Speech Corpus
(CSIR Meraka Institute; North-West University, 2019-06-01) ~ - Resource Catalogue
The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
TurboAnnotate1.0
(Centre for Text Technology (CTexT), 2013-07-01) ~ - Resource Index
TurboAnnotate is a user-friendly annotating environment (i.e. tool) for bootstrapping linguistic data for machine-learning purposes, or for manually ...
South African Fonts
(Translate.org.za, 2015-01-28) ~ - Resource Index
The South African fonts collection is a set of open fonts that cover all characters needed by all 11 South African languages. The fonts ensure that all ...
NCHLT South African Language Identifier
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
A graphical user interface and command line tool to automatically classify a document, paragraph, sentence or phrase as one of the eleven official South ...