A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Recent Submissions

  • N|uu language archive 

    Collins, Chris, et al. (Collins, Chris; Sands, Bonny; Jones, Kerry, 2022-08-11)
    This collection contains information that forms the basis of the N|uu dictionary which contains a word list for N|uu with translations into Afrikaans, ...
  • CGE's Sesotho Gender Terminology List 

    Commission for Gender Equality (CGE), et al. (Commission for Gender Equality (CGE), 2018)
    CGE's Sesotho Gender Terminology List is a list of terms, either words or phrases, related to the promotion of gender equality. All 446 words or phrases ...
  • Proof of concept: Afrikaans English Venda E-dictionary 

    Bosch, Sonja, et al. (Published as a Lexonomy dictionary (https://www.lexonomy.eu/POCVenEngAfr/), 2022-03-04)
    This proof of concept is a result of an experiment to compile a trilingual e-dictionary for Afrikaans, Venda and English. It includes 613 items and is ...
  • Bilingual English-Siswati Corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2022-03-31)
    Aligned parallel corpora for the following language pair: English-SiSwati. The data is given as four separate UTF-8 text files, with each segment on a ...
  • Monolingual Siswati Corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2022-03-31)
    Monolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced ...
  • Sesotho syllabification systems 

    Sibeko, Johannes, et al. (South African Centre for Digital Language Resources, 2022-02-03)
    This package contains two syllabification systems for Sesotho (rule-based and TeX-based).
  • Sesotho syllable wordlist 

    Sibeko, Johannes, et al. (South African Centre for Digital Language Resources, 2022-02-03)
    This package contains a wordlist containing Sesotho words and their syllable information.
  • Linguistically enriched corpora for conjunctively written South African languages 

    Puttkammer, Martin, et al. (North-West University, Centre for Language Technology (CTexT), 2021-09)
    This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family ...
  • Description of N|uu 

    Sands, Bonny, et al. (Bonny Sands, 2015-10-06)
    Recordings of dictionary entries for a pan-dialectal dictionary of the N|uu language (Eastern and Western dialects) made by Bonny Sands, Johanna Brugman, ...
  • Mburisano Covid-19 multilingual corpus 

    Marais, Laurette (CSIR Voice Computing, 2020-12-04)
    This corpus was created to aid development of the AwezaMed Covid-19 speech-to-speech mobile application. The project within which it was created, ...

View more