A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Recent Submissions

  • Lwazi III isiZulu TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III isiXhosa TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III English TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • Lwazi III Afrikaans TTS Corpus 

    Aby Louw, et al. (Meraka Institute, CSIR, 2016-06-17) ~ Resource Catalogue
    Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.
  • NCHLT Speech II Corpus 

    Jaco Badenhorst, et al. (Meraka Institute, CSIR, 2016-05-09) ~ Resource Catalogue
    The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and ...
  • NCHLT isiNdebele Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
  • NCHLT Siswati Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
  • NCHLT Sepedi Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
  • NCHLT Xitsonga Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
  • NCHLT Tshivenda Speech Corpus 

    Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
    Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

View more