Welcome to the Language Resource Management Agency of SADiLaR. This repository provides access to all of the collections, data sets, tools and other language resources that are distributed by SADiLaR.

The repository will eventually replace all of the functionality of the original RMA site, with all of the resources available from the RMA, also available from this repository.

Select a community to browse its collections.

Language Resource Management Agency [405]
  • CGE's Sesotho Gender Terminology List 

    Commission for Gender Equality (CGE), et al. (Commission for Gender Equality (CGE), 2018)
    CGE's Sesotho Gender Terminology List is a list of terms, either words or phrases, related to the promotion of gender equality. All 446 words or phrases ...
  • Proof of concept: Afrikaans English Venda E-dictionary 

    Bosch, Sonja, et al. (Published as a Lexonomy dictionary (https://www.lexonomy.eu/POCVenEngAfr/), 2022-03-04)
    This proof of concept is a result of an experiment to compile a trilingual e-dictionary for Afrikaans, Venda and English. It includes 613 items and is ...
  • Bilingual English-Siswati Corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2022-03-31)
    Aligned parallel corpora for the following language pair: English-SiSwati. The data is given as four separate UTF-8 text files, with each segment on a ...
  • Monolingual Siswati Corpus 

    McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2022-03-31)
    Monolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced ...
  • South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) 

    Van Dyk, Tobie (ICELDA; SADiLaR, 2021)
    The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ...
  • Sesotho syllabification systems 

    Sibeko, Johannes, et al. (South African Centre for Digital Language Resources, 2022-02-03)
    This package contains two syllabification systems for Sesotho (rule-based and TeX-based).
  • Sesotho syllable wordlist 

    Sibeko, Johannes, et al. (South African Centre for Digital Language Resources, 2022-02-03)
    This package contains a wordlist containing Sesotho words and their syllable information.
  • CTexT fastText Skipgram String Embeddings 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans fastText Skipgram String Embeddings is a 300 dimensional Afrikaans embedding model based on the Skipgram fastText architecture that ...
  • CTexT Afrikaans GloVe Word Embeddings 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans GloVe Word Embeddings is a 300 dimensional Afrikaans embedding model based on the Global Vectors architecture (Pennington, 2014) ...
  • CTexT Afrikaans FLAIR String Embeddings 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans FLAIR String Embeddings are two Afrikaans embedding models based on the FLAIR architecture (Akbik et al. 2018, 2019) that provides ...

View more