A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.

Recent Submissions

  • Generic Bilingual Academic Wordlist with Definitions 

    ICELDA, et al. (ICELDA; SADiLaR, 2021)
    The academic wordlist has been developed to serve as a resource to students to assist them to better understand words used within the information they ...
  • Denominal adjectives in Afrikaans dataset 

    Trollip, Benito (South African Centre for Digital Language Resources, 2020-05-15) ~ Resource Catalogue
    This dataset contain a collection of Afrikaans denominal adjectives that were extracted from the Virtual Institute for Afrikaans' corpus portal. The ...
  • Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad 

    Coetzer, G.C., et al. (North-West University, 2019-02-18) ~ Resource Catalogue
    Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad
  • SPCS Speech Corpus 

    Modipa, T. I., et al. (Council for Scientific and Industrial Research; North-West University, 2015-11-25) ~ Resource Catalogue
    Broadband speech corpus of approximately 10 hours and the corresponding transcriptions. The development process of the corpus involved the recording ...
  • High quality TTS data for four South African languages (af, st, tn, xh) 

    Unknown author (Google; North-West University, 2017) ~ Resource Catalogue
    This data set contains multi-speaker TTS high quality transcribed audio data for four languages of South Africa: Afrikaans, Sesotho, Setswana and isiXhosa. ...
  • Bilingual English-isiXhosa corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
    Aligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a ...
  • Monolingual isiXhosa corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
    Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data ...
  • NCHLT English Auxiliary Speech Corpus 

    Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
    The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
  • NCHLT Afrikaans Auxiliary Speech Corpus 

    Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
    The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
  • NCHLT Xitsonga Auxiliary Speech Corpus 

    Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
    The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...

View more