A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Recent Submissions

  • Linguistically enriched corpora for conjunctively written South African languages 

    Puttkammer, Martin, et al. (North-West University, Centre for Language Technology (CTexT), 2021-09)
    This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family ...
  • Description of N|uu 

    Sands, Bonny, et al. (Bonny Sands, 2015-10-06)
    Recordings of dictionary entries for a pan-dialectal dictionary of the N|uu language (Eastern and Western dialects) made by Bonny Sands, Johanna Brugman, ...
  • Mburisano Covid-19 multilingual corpus 

    Marais, Laurette (CSIR Voice Computing, 2020-12-04)
    This corpus was created to aid development of the AwezaMed Covid-19 speech-to-speech mobile application. The project within which it was created, ...
  • Denominal adjectives in Afrikaans dataset 

    Trollip, Benito (South African Centre for Digital Language Resources, 2020-05-15) ~ Resource Catalogue
    This dataset contain a collection of Afrikaans denominal adjectives that were extracted from the Virtual Institute for Afrikaans' corpus portal. The ...
  • Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad 

    Coetzer, G.C., et al. (North-West University, 2019-02-18) ~ Resource Catalogue
    Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad
  • SPCS Speech Corpus 

    Modipa, T. I., et al. (Council for Scientific and Industrial Research; North-West University, 2015-11-25) ~ Resource Catalogue
    Broadband speech corpus of approximately 10 hours and the corresponding transcriptions. The development process of the corpus involved the recording ...
  • Speech transcription platform user interface 

    Kleynhans, Neil, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the user interface component of the Speech Transcription Platform developed by the Multilingual Speech Technologies group at North-West University ...
  • Speech transcription platform speech services 

    Van Niekerk, Daniel, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the Language Technology Services component implemented for the Speech Transcription Platform project by the Multilingual Speech Technologies ...
  • High quality TTS data for four South African languages (af, st, tn, xh) 

    Unknown author (Google; North-West University, 2017) ~ Resource Catalogue
    This data set contains multi-speaker TTS high quality transcribed audio data for four languages of South Africa: Afrikaans, Sesotho, Setswana and isiXhosa. ...
  • Speech transcription server 

    Kleynhans, Neil, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the "Parliament-specific" application server component implemented as a proof-of-concept during the Speech Transcription Platform project by the ...

View more