Search
Now showing items 21-30 of 63
NCHLT Sesotho FLAIR-backward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
NCHLT Sesotho RoBERTa language model
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ...
SADE Municipality Hotline IVR Prompts
(North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC, 2015-09-07) ~ - Resource Catalogue
Audio and corresponding transcriptions for the SADE Municipality Hotline IVR prompts in English, Sesotho and isiZulu. The English SADE municipality ...
NCHLT Sesotho FLAIR-forward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
Autshumato Text Anonymiser
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ - Resource Catalogue
Anonymises text by classifying and replacing sensitive information such as person names, business names, place names, monetary values, phone numbers, ...
NCHLT Sesotho Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
NCHLT Sesotho fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
Lwazi II Cross-lingual Proper Name Corpus
(Meraka Institute, CSIR; North-West University, 2015-11-20) ~ - Resource Catalogue
Prompted audio recordings of personal names in different languages, produced by 20 speakers with different language backgrounds.
South African Directory Enquiries (SADE) Name Corpus
(North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC, 2015-09-07) ~ - Resource Catalogue
"Audio and tagged orthographic transcriptions of South African names produced by first-language speakers of 4 languages: Afrikaans, English, isiZulu, ...
NCHLT Sesotho word2vec-CBOW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides ...