Welcome to the Language Resource Management Agency of SADiLaR. This repository provides access to all of the collections, data sets, tools and other language resources that are distributed by SADiLaR.

The repository will eventually replace all of the functionality of the original RMA site, with all of the resources available from the RMA, also available from this repository.

Select a community to browse its collections.

Language Resource Management Agency [378]
  • Denominal adjectives in Afrikaans dataset 

    Trollip, Benito (South African Centre for Digital Language Resources, 2020-05-15) ~ Resource Catalogue
    This dataset contain a collection of Afrikaans denominal adjectives that were extracted from the Virtual Institute for Afrikaans' corpus portal. The ...
  • Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad 

    Coetzer, G.C., et al. (North-West University, 2019-02-18) ~ Resource Catalogue
    Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad
  • SPCS Speech Corpus 

    Modipa, T. I., et al. (Council for Scientific and Industrial Research; North-West University, 2015-11-25) ~ Resource Catalogue
    Broadband speech corpus of approximately 10 hours and the corresponding transcriptions. The development process of the corpus involved the recording ...
  • Speech transcription platform user interface 

    Kleynhans, Neil, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the user interface component of the Speech Transcription Platform developed by the Multilingual Speech Technologies group at North-West University ...
  • Speech transcription platform speech services 

    Van Niekerk, Daniel, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the Language Technology Services component implemented for the Speech Transcription Platform project by the Multilingual Speech Technologies ...
  • High quality TTS data for four South African languages (af, st, tn, xh) 

    Unknown author (Google; North-West University, 2017) ~ Resource Catalogue
    This data set contains multi-speaker TTS high quality transcribed audio data for four languages of South Africa: Afrikaans, Sesotho, Setswana and isiXhosa. ...
  • Speech transcription server 

    Kleynhans, Neil, et al. (Multilingual Speech Technologies, North-West University, 2017) ~ Resource Index
    This is the "Parliament-specific" application server component implemented as a proof-of-concept during the Speech Transcription Platform project by the ...
  • Bilingual English-isiXhosa corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
    Aligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a ...
  • Monolingual isiXhosa corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
    Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data ...
  • NCHLT English Auxiliary Speech Corpus 

    Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
    The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...

View more