Search
Now showing items 51-60 of 70
Combination Tagger
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
The combination tagger framework uses MBT, SVM, MXPOST and TnT. Each tagger receives a weight by which it can vote for a tag.
Autshumato Multilingual Word and Phrase Translations
(North-West University; Centre for Text Technology (CTexT), 2016-01-20) ~ - Resource Catalogue
Word and phrase lists aligned from English to the other official South African languages.
Lwazi Telephony Platform
(Meraka Institute, CSIR, 2013-07-15) ~ - Resource Catalogue
Lwazi is a robust telephony platform aiming to facilitate speedy development of experimental applications without sacrificing power by combining Asterisk ...
Qfrency TTS phone mappings
(CSIR, 2018-03-02) ~ - Resource Index
TTS phone mappings between IPA, XSAMPA and our Qfrency internal format, standardised across all 11 SA languages. To be used in conjunction with the Lwazi ...
CorpusCatcher
(Translate.org.za, 2015-01-28) ~ - Resource Index
Corpus Catcher is a tool that is designed to crawl the web to retrieve data for inclusion in a corpus. It makes use of seed documents/wordlists to ...
Final year high school examination texts of South African home and first additional language subjects
(South African Centre for Digital Language Resources, 2022-11-16)
This data collection consists of reading comprehension and summary
writing texts. The texts comprise of the final year high school exam
texts for ...
NCHLT Part of Speech Taggers
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Part of speech taggers developed during the NCHLT Text project.
Available for the following languages: Afrikaans, English, isiNdebele, isiXhosa, isiZulu, ...
Linguistically enriched corpora for conjunctively written South African languages
(North-West University, Centre for Language Technology (CTexT), 2021-09)
This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family ...
NCHLT Siswati Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
NCHLT Siswati Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...