Search
Now showing items 21-30 of 67
Corpus of multilingual code-switched soap opera speech
(Stellenbosch University, 2020-02-28)
The corpus comprises 26.9 hours of annotated multilingual speech that contains examples of code-switching in isiZulu, isiXhosa, Setswana, Sesotho and ...
NCHLT isiZulu Speech Corpus
(Meraka Institute, CSIR; North-West University, 2014-07-08) ~ - Resource Catalogue
Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
NCHLT isiZulu RoBERTa language model
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ...
NCHLT isiZulu FLAIR-forward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
NCHLT isiZulu Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
SADE Municipality Hotline IVR Prompts
(North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC, 2015-09-07) ~ - Resource Catalogue
Audio and corresponding transcriptions for the SADE Municipality Hotline IVR prompts in English, Sesotho and isiZulu. The English SADE municipality ...
NCHLT isiZulu fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
Autshumato Text Anonymiser
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ - Resource Catalogue
Anonymises text by classifying and replacing sensitive information such as person names, business names, place names, monetary values, phone numbers, ...
Autshumato English-isiZulu Parallel Corpora
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Parallel corpora aligned on sentence level through a combination of automatic and manual alignment techniques. The parallel corpora were obtained from ...
Translate.org.za isiZulu - isiXhosa Corpus 2012
(Translate.org.za, 2013-06-19) ~ - Resource Catalogue
isiZulu-isiXhosa translation memory.