Welcome to the Language Resource Management Agency of SADiLaR. This repository provides access to all of the collections, data sets, tools and other language resources that are distributed by SADiLaR.

The repository will eventually replace all of the functionality of the original RMA site, with all of the resources available from the RMA, also available from this repository.

Select a community to browse its collections.

Language Resource Management Agency [375]
  • Speech transcription platform user interface 

    Kleynhans, Neil, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the user interface component of the Speech Transcription Platform developed by the Multilingual Speech Technologies group at North-West University ...
  • Speech transcription platform speech services 

    Van Niekerk, Daniel, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the Language Technology Services component implemented for the Speech Transcription Platform project by the Multilingual Speech Technologies ...
  • High quality TTS data for four South African languages (af, st, tn, xh) 

    Unknown author (Google; North-West University, 2017) ~ Resource Catalogue
    This data set contains multi-speaker TTS high quality transcribed audio data for four languages of South Africa: Afrikaans, Sesotho, Setswana and isiXhosa. ...
  • Speech transcription server 

    Kleynhans, Neil, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the "Parliament-specific" application server component implemented as a proof-of-concept during the Speech Transcription Platform project by the ...
  • Bilingual English-isiXhosa corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
    Aligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a ...
  • Monolingual isiXhosa corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
    Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data ...
  • NCHLT English Auxiliary Speech Corpus 

    Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
    The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
  • NCHLT Afrikaans Auxiliary Speech Corpus 

    Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
    The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
  • NCHLT Xitsonga Auxiliary Speech Corpus 

    Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
    The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
  • NCHLT Setswana Auxiliary Speech Corpus 

    Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
    The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...

View more