A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.

Recent Submissions

  • Afrikaans text unit identification data 

    Puttkammer, Martin (Centre for Text Technology, North-West University, 2006) ~ Resource Catalogue
    This dataset was developed during a masters degree and used in the development of a text unit identifier capable of tagging sentences, named-entities, ...
  • Autshumato Machine Translation Evaluation Set 

    McKellar, Cindy Arlene (North-West University; Centre for Text Technology (CTexT); Department of Arts and Culture, South Africa, 2017-12-15) ~ Resource Catalogue
    Comparable evaluation data for use in automatic machine translation evaluations. The evaluation set consists of 500 sentences translated separately by ...
  • CTexTools 2 

    Eiselen, Roald, et al. (North-West University, Centre for Text Technology (CTexT); South African Department of Arts and Culture, 2018-05-24) ~ Resource Catalogue
    CTexTools is a corpus query and manipulation tool primarily for the official South African languages. The tool supports the creation of frequency and ...
  • Afrikaans speaking children's first lexical items 

    Brink, Nina (North-West University, 2018-05-17)
    Data collected for a master's study in Afrikaans linguistics. The data consist of the first lexical items of 21 Afrikaans speaking children. The lexical ...
  • Setswana Test suite and Treebank 

    Berg, Ansu (North-West University, 2018-03-27) ~ Resource Catalogue
    The main aim of the PhD study "A computational syntactic analysis of Setswana"(AS Berg, May 2018) is the computational syntactic analysis of the Setswana ...
  • Lwazi III isiZulu TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III isiXhosa TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III English TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III Afrikaans TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • NCHLT Speech II Corpus 

    Jaco Badenhorst, et al. (Meraka Institute, CSIR, 2016-05-09) ~ Resource Catalogue
    The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and ...

View more