Now showing items 1-10 of 414

    • CSIR SAMA Speech Corpus Manual Datasets 

      Bandehorst, Jaco, et al. (Voice Computing (VC) Research Group at the CSIR Nextgen Enterprises and Institutions (NGEI); SADiLaR, 2023-12)
      The evaluation corpus contains orthographically transcribed broadband speech in Afrikaans, isiXhosa, isiZulu, Sepedi, Sesotho, Tshivenḓa all part of ...
    • AwezaMed automatic speech recognition (ASR) test data 

      Bandehorst, Jaco (Voice Computing (VC) Research Group at the CSIR Nextgen Enterprises and Institutions (NGEI), 2020-12)
      The corpus contains orthographically transcribed broadband speech in four official languages of South Africa: Afrikaans, English, isiXhosa and isiZulu. ...
    • IsiZulu Second Language Learner Speech Corpus 

      O'Neil, Alexandra, et al. (Indiana University, 2024)
      This corpus is specifically designed to assist in evaluating the performance of pronunciation feedback tools for second language learning. The corpus ...
    • Afrikaans lexical blends dataset 

      Trollip, Benito, et al. (North-West University, 2023-12)
      This a dataset of Afrikaans blend constructions that have been collected and analysed using the Levenshtein distance metric. This dataset serves as the ...
    • South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) version 2023-03 

      Van Dyk, Tobie (ICELDA; SADiLaR, 2023-03)
      The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ...
    • Autshumato Monolingual Setswana Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato Monolingual Sesotho Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato Monolingual Sepedi Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for Sepedi. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato Monolingual isiZulu Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for isiZulu. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato Monolingual Afrikaans Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...