Resource Index: Recent submissions
Now showing items 1-10 of 412
-
IsiZulu Second Language Learner Speech Corpus
(Indiana University, 2024)This corpus is specifically designed to assist in evaluating the performance of pronunciation feedback tools for second language learning. The corpus ... -
Afrikaans lexical blends dataset
(North-West University, 2023-12)This a dataset of Afrikaans blend constructions that have been collected and analysed using the Levenshtein distance metric. This dataset serves as the ... -
South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) version 2023-03
(ICELDA; SADiLaR, 2023-03)The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ... -
Autshumato Monolingual Setswana Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sesotho Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sepedi Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sepedi. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual isiZulu Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for isiZulu. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Afrikaans Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato English-Xitsonga Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Xitsonga. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Setswana Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a ...