A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.

Recent Submissions

  • CTexTools 2 

    Eiselen, Roald, et al. (North-West University, Centre for Text Technology (CTexT); South African Department of Arts and Culture, 2018-06) ~ Resource Catalogue
    CTexTools is a corpus query and manipulation tool primarily for the official South African languages. The tool supports the creation of frequency and ...
  • Afrikaans speaking children's first lexical items 

    Brink, Nina (North-West University, 2018)
    Data collected for a master's study in Afrikaans linguistics. The data consist of the first lexical items of 21 Afrikaans speaking children. The lexical ...
  • Setswana Test suite and Treebank 

    Berg, Ansu (North-West University, 2018) ~ Resource Catalogue
    The main aim of the PhD study "A computational syntactic analysis of Setswana"(AS Berg, May 2018) is the computational syntactic analysis of the Setswana ...
  • Lwazi III isiZulu TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III isiXhosa TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III English TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III Afrikaans TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • NCHLT Speech II Corpus 

    Jaco Badenhorst, et al. (Meraka Institute, CSIR, 2016-05-09) ~ Resource Catalogue
    The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and ...
  • NCHLT isiNdebele Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
  • NCHLT Siswati Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

View more