SADiLaR Language Resource Repository: Recent submissions
Now showing items 101-110 of 529
-
NCHLT Afrikaans fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ... -
South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) version 2023-03
(ICELDA; SADiLaR, 2023-03)The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ... -
Autshumato Monolingual Setswana Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sesotho Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sepedi Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sepedi. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual isiZulu Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for isiZulu. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Afrikaans Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato English-Xitsonga Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Xitsonga. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Setswana Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Sesotho Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Sesotho. The data is given as two separate UTF-8 text files, with each aligned segment on a ...