A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.

Recent Submissions

  • Corpus of multilingual code-switched soap opera speech 

    van der Westhuizen, Ewald, et al. (Stellenbosch University, 2020-02-28)
    The corpus comprises 26.9 hours of annotated multilingual speech that contains examples of code-switching in isiZulu, isiXhosa, Setswana, Sesotho and ...
  • COVID-19 Multilingual Terminology 

    City of Tshwane, et al. (City of Tshwane; South African Centre for Digital Language Resources (SADiLaR); Department of Science and Innovation; Pan South African Language Board (PanSALB), 2021-07)
    COVID-19 multilingual terminology list document in all the South African languages. The development of this terminology list was initiated by City of ...
  • CGE's Afrikaans Gender Terminology List 

    Commission for Gender Equality (CGE), et al. (Commission for Gender Equality (CGE), 2021-04)
    CGE's Afrikaans Gender Terminology List is a list of terms, either words or phrases, related to the promotion of gender equality. All 436 words or phrases ...
  • Human Language Technology Audit 2017/18 

    Moors, Carmen, et al. (CSIR, 2018-08-31)
    This document reports on all work conducted in the 2017/18 Audit of human language technology (HLT) resources available in South Africa project. The ...
  • Generic Bilingual Academic Wordlist with Definitions 

    ICELDA, et al. (ICELDA; SADiLaR, 2021)
    The academic wordlist has been developed to serve as a resource to students to assist them to better understand words used within the information they ...
  • Denominal adjectives in Afrikaans dataset 

    Trollip, Benito (South African Centre for Digital Language Resources, 2020-05-15) ~ Resource Catalogue
    This dataset contain a collection of Afrikaans denominal adjectives that were extracted from the Virtual Institute for Afrikaans' corpus portal. The ...
  • Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad 

    Coetzer, G.C., et al. (North-West University, 2019-02-18) ~ Resource Catalogue
    Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad
  • SPCS Speech Corpus 

    Modipa, T. I., et al. (Council for Scientific and Industrial Research; North-West University, 2015-11-25) ~ Resource Catalogue
    Broadband speech corpus of approximately 10 hours and the corresponding transcriptions. The development process of the corpus involved the recording ...
  • High quality TTS data for four South African languages (af, st, tn, xh) 

    Unknown author (Google; North-West University, 2017) ~ Resource Catalogue
    This data set contains multi-speaker TTS high quality transcribed audio data for four languages of South Africa: Afrikaans, Sesotho, Setswana and isiXhosa. ...
  • Bilingual English-isiXhosa corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
    Aligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a ...

View more