A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Recent Submissions

  • Sesotho syllabification systems 

    Sibeko, Johannes, et al. (South African Centre for Digital Language Resources, 2022-02-03)
    This package contains two syllabification systems for Sesotho (rule-based and TeX-based).
  • Sesotho syllable wordlist 

    Sibeko, Johannes, et al. (South African Centre for Digital Language Resources, 2022-02-03)
    This package contains a wordlist containing Sesotho words and their syllable information.
  • Linguistically enriched corpora for conjunctively written South African languages 

    Puttkammer, Martin, et al. (North-West University, Centre for Language Technology (CTexT), 2021-09)
    This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family ...
  • Description of N|uu 

    Sands, Bonny, et al. (Bonny Sands, 2015-10-06)
    Recordings of dictionary entries for a pan-dialectal dictionary of the N|uu language (Eastern and Western dialects) made by Bonny Sands, Johanna Brugman, ...
  • Mburisano Covid-19 multilingual corpus 

    Marais, Laurette (CSIR Voice Computing, 2020-12-04)
    This corpus was created to aid development of the AwezaMed Covid-19 speech-to-speech mobile application. The project within which it was created, ...
  • Denominal adjectives in Afrikaans dataset 

    Trollip, Benito (South African Centre for Digital Language Resources, 2020-05-15) ~ Resource Catalogue
    This dataset contain a collection of Afrikaans denominal adjectives that were extracted from the Virtual Institute for Afrikaans' corpus portal. The ...
  • Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad 

    Coetzer, G.C., et al. (North-West University, 2019-02-18) ~ Resource Catalogue
    Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad
  • SPCS Speech Corpus 

    Modipa, T. I., et al. (Council for Scientific and Industrial Research; North-West University, 2015-11-25) ~ Resource Catalogue
    Broadband speech corpus of approximately 10 hours and the corresponding transcriptions. The development process of the corpus involved the recording ...
  • Speech transcription platform user interface 

    Kleynhans, Neil, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the user interface component of the Speech Transcription Platform developed by the Multilingual Speech Technologies group at North-West University ...
  • Speech transcription platform speech services 

    Van Niekerk, Daniel, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the Language Technology Services component implemented for the Speech Transcription Platform project by the Multilingual Speech Technologies ...

View more