In 2009, the South African National HLT Network (NHN) funded a technology audit that was conducted to form a clear profile of the research and development activities in the human language technology field in South Africa. This audit was used as the basis for the RMA Index, which is a list of South African resources with the relevant metadata (information such as developer details and specifications). Some of these resources are included in the RMA Catalogue, and are therefore available for download.

Collections in this community

  • Resource Catalogue [220]

    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
  • Resource Index [355]

    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Recent Submissions

  • Sesotho vowel speech data set 

    Wissing, Daan (Centre for Text Technology, North-West University, 2019-05-28) ~ Resource Catalogue
    The primary aim of this speech dataset was to collect a representative set of words in which all the Sesotho vowels are present. Some of them are ...
  • Sesotho function word speech data 

    Wissing, Daan (Centre for Text Technology, North-West University, 2109) ~ Resource Catalogue
    The primary aim of this speech data set was to study the role of tone in the function word "ke" in the minimal pairs "ke motho" and in the function word ...
  • Read Afrikaans Normal/ Read Afrikaans Fast 

    Wissing, Daan (Centre for Text Technology, North-West University, 2019) ~ Resource Catalogue
    The corpus contains speech of 127 mother tongue speakers of Afrikaans. Every speaker was asked to read a text fragment from a book or a newspaper (about ...
  • Sesotho tone data set 

    Wissing, Daan (Centre for Text Technology, North-West University, 2019-05-28) ~ Resource Catalogue
    These recordings are of male and female speakers (11 for tasks 1 and 2; 10 for task 3) of the QwaQwa region (Eastern Free State). Ages of the speakers ...
  • Afrikaans text unit identification data 

    Puttkammer, Martin (Centre for Text Technology, North-West University, 2006) ~ Resource Catalogue
    This dataset was developed during a masters degree and used in the development of a text unit identifier capable of tagging sentences, named-entities, ...
  • Autshumato Machine Translation Evaluation Set 

    McKellar, Cindy Arlene (North-West University; Centre for Text Technology (CTexT); Department of Arts and Culture, South Africa, 2017-12-15) ~ Resource Catalogue
    Comparable evaluation data for use in automatic machine translation evaluations. The evaluation set consists of 500 sentences translated separately by ...
  • Qfrency TTS phone mappings 

    Unknown author (CSIR, 2018-03-02) ~ Resource Index
    TTS phone mappings between IPA, XSAMPA and our Qfrency internal format, standardised across all 11 SA languages. To be used in conjunction with the Lwazi ...
  • GF Miniature Resource for Tswana 

    Laurette Marais, Meraka, et al. (HLT Research Group, Meraka Institute, CSIR, 2018-03-06) ~ Resource Index
    This miniature resource grammar parses and generates main clause sentences in various tenses, moods and aspects in Tswana. The lexicon is limited, but ...
  • Qfrency TTS Afrikaans Maryna recordings 

    Unknown author (CSIR, 2018-03-07) ~ Resource Index
    Studio quality recordings of text-to-speech data in Afrikaans and some English utterances. Professional Afrikaans first language voice artist.
  • Qfrency TTS Afrikaans Kobus recordings 

    Unknown author (CSIR, 2018-03-07) ~ Resource Index
    Studio quality recordings of text-to-speech data in Afrikaans and some English utterances. Professional Afrikaans first language voice artist.

View more