Search
Now showing items 1-10 of 20
NCHLT Optical Character Recognition for South African Languages
(North-West University; Centre for Text Technology (CTexT), 2017-02-23) ~ - Resource Catalogue
An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure ...
NCHLT South African Language Identifier
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
A graphical user interface and command line tool to automatically classify a document, paragraph, sentence or phrase as one of the eleven official South ...
Autshumato TMS
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Terminology Management System. Web application used by Terminologists and Administrators to capture, edit and export terminology.
Autshumato PDF Text Extractor
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ - Resource Catalogue
Utility application for extracting text out of a PDF document. The pages can also be extracted as images.
NCHLT-inlang Pronunciation Dictionaries
(Meraka Institute, CSIR; North-West University, 2014-07-04) ~ - Resource Catalogue
Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations ...
CTexTools
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Corpus query and manipulation tool for performing tokenisation and sentencisation; extracting frequency list and word list; searching; and extracting ...
CTexT Alignment Interface Pro
(North-West University; Centre for Text Technology (CTexT), 2013-06-21) ~ - Resource Catalogue
Utility application for the manual alignment of source texts. Pro version allows for the editing of the segments.
Autshumato Text Anonymiser
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ - Resource Catalogue
Anonymises text by classifying and replacing sensitive information such as person names, business names, place names, monetary values, phone numbers, ...
PHONAAS
(North-West University; Centre for Text Technology (CTexT), 2015-06-30) ~ - Resource Catalogue
PHONAAS is a graphical user interface (GUI) tool, written in Perl and GTK2, using the R programming language and PRAAT to extract vowel formant data.
Bilingual English-isiXhosa corpus
(North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ - Resource Catalogue
Aligned parallel corpora for the following language pair: English-isiXhosa.
The data is given as two separate UTF-8 text files, with each segment on a ...