Search
Now showing items 51-60 of 345
NCHLT Siswati fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
Multilingual Mathematics Terminology List (Grade R - 6)
(Terminology Coordination Section of the National Language Service, Department of Arts and Culture, 2017-03-03) ~ - Resource Index
984 English source terms with their equivalents in the ten other official South African languages. The list was compiled in collaboration with subject ...
NCHLT Sepedi FLAIR-forward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
CKarma
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
CKarma is a compound analyser for Afrikaans, to be used for the detection of word boundaries within compounds. It takes as input a string, and produces ...
Pretoria Sepedi Corpus (Gold Standard)
(Department of African Languages - University of Pretoria, 2015-01-27) ~ - Resource Index
A section of the Pretoria Sepedi Corpus for POS, manually checked for POS tags.
Sepedi Grapheme-to-Phoneme Converter
(University of South Africa, 2015-01-28) ~ - Resource Index
Converting morphemes of Sesotho sa Leboa to phonological representations.
EWA
(WAT (Afrikaans NLU), 2013-07-01) ~ - Resource Index
Die Etimologiewoordeboek van die Afrikaanse Taal is an Afrikaans etymological dictionary, distributed in Folio Views for Windows 4.6.0.7
Autshumato Xitsonga Monolingual Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~ - Resource Catalogue
Xitsonga monolingual corpus as deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a newline.
Autshumato English-Afrikaans Translation Memory
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Translation memory from English (EN-GB) to Afrikaans, in the government domain for use in the Autshumato ITE application.
South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) version 2023-03
(ICELDA; SADiLaR, 2023-03)
The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ...