Search
Now showing items 61-70 of 86
Lwazi Telephony Platform
(Meraka Institute, CSIR, 2013-07-15) ~ - Resource Catalogue
Lwazi is a robust telephony platform aiming to facilitate speedy development of experimental applications without sacrificing power by combining Asterisk ...
NCHLT Siswati word2vec-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
Qfrency TTS phone mappings
(CSIR, 2018-03-02) ~ - Resource Index
TTS phone mappings between IPA, XSAMPA and our Qfrency internal format, standardised across all 11 SA languages. To be used in conjunction with the Lwazi ...
CorpusCatcher
(Translate.org.za, 2015-01-28) ~ - Resource Index
Corpus Catcher is a tool that is designed to crawl the web to retrieve data for inclusion in a corpus. It makes use of seed documents/wordlists to ...
Final year high school examination texts of South African home and first additional language subjects
(South African Centre for Digital Language Resources, 2022-11-16)
This data collection consists of reading comprehension and summary
writing texts. The texts comprise of the final year high school exam
texts for ...
NCHLT Part of Speech Taggers
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Part of speech taggers developed during the NCHLT Text project.
Available for the following languages: Afrikaans, English, isiNdebele, isiXhosa, isiZulu, ...
Linguistically enriched corpora for conjunctively written South African languages
(North-West University, Centre for Language Technology (CTexT), 2021-09)
This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family ...
NCHLT Siswati Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
NCHLT Siswati fastText-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...
NCHLT Siswati Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...