Resource Index: Recent submissions
Now showing items 1-10 of 414
-
CSIR SAMA Speech Corpus Manual Datasets
(Voice Computing (VC) Research Group at the CSIR Nextgen Enterprises and Institutions (NGEI); SADiLaR, 2023-12)The evaluation corpus contains orthographically transcribed broadband speech in Afrikaans, isiXhosa, isiZulu, Sepedi, Sesotho, Tshivenḓa all part of ... -
AwezaMed automatic speech recognition (ASR) test data
(Voice Computing (VC) Research Group at the CSIR Nextgen Enterprises and Institutions (NGEI), 2020-12)The corpus contains orthographically transcribed broadband speech in four official languages of South Africa: Afrikaans, English, isiXhosa and isiZulu. ... -
IsiZulu Second Language Learner Speech Corpus
(Indiana University, 2024)This corpus is specifically designed to assist in evaluating the performance of pronunciation feedback tools for second language learning. The corpus ... -
Afrikaans lexical blends dataset
(North-West University, 2023-12)This a dataset of Afrikaans blend constructions that have been collected and analysed using the Levenshtein distance metric. This dataset serves as the ... -
South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) version 2023-03
(ICELDA; SADiLaR, 2023-03)The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ... -
Autshumato Monolingual Setswana Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sesotho Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sepedi Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sepedi. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual isiZulu Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for isiZulu. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Afrikaans Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...