In 2009, the South African National HLT Network (NHN) funded a technology audit that was conducted to form a clear profile of the research and development activities in the human language technology field in South Africa. This audit was used as the basis for the RMA Index, which is a list of South African resources with the relevant metadata (information such as developer details and specifications). Some of these resources are included in the RMA Catalogue, and are therefore available for download.

Collections in this community

  • Resource Catalogue [214]

    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
  • Resource Index [324]

    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Recent Submissions

  • CTexTools 2 

    Eiselen, Roald, et al. (North-West University, Centre for Text Technology (CTexT); South African Department of Arts and Culture, 2018-06) ~ Resource Catalogue
    CTexTools is a corpus query and manipulation tool primarily for the official South African languages. The tool supports the creation of frequency and ...
  • Afrikaans speaking children's first lexical items 

    Brink, Nina (North-West University, 2018)
    Data collected for a master's study in Afrikaans linguistics. The data consist of the first lexical items of 21 Afrikaans speaking children. The lexical ...
  • Setswana Test suite and Treebank 

    Berg, Ansu (North-West University, 2018) ~ Resource Catalogue
    The main aim of the PhD study "A computational syntactic analysis of Setswana"(AS Berg, May 2018) is the computational syntactic analysis of the Setswana ...
  • Lwazi III isiZulu TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III isiXhosa TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III English TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III Afrikaans TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • NCHLT Speech II Corpus 

    Jaco Badenhorst, et al. (Meraka Institute, CSIR, 2016-05-09) ~ Resource Catalogue
    The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and ...
  • NCHLT isiNdebele Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
  • NCHLT Siswati Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

View more