Welcome to the Language Resource Management Agency of SADiLaR. This repository provides access to all of the collections, data sets, tools and other language resources that are distributed by SADiLaR.

The repository will eventually replace all of the functionality of the original RMA site, with all of the resources available from the RMA, also available from this repository.

Select a community to browse its collections.

Language Resource Management Agency [401]
  • South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) 

    Van Dyk, Tobie (ICELDA; SADiLaR, 2021)
    The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ...
  • Sesotho syllabification systems 

    Sibeko, Johannes, et al. (South African Centre for Digital Language Resources, 2022-02-03)
    This package contains two syllabification systems for Sesotho (rule-based and TeX-based).
  • Sesotho syllable wordlist 

    Sibeko, Johannes, et al. (South African Centre for Digital Language Resources, 2022-02-03)
    This package contains a wordlist containing Sesotho words and their syllable information.
  • CTexT fastText Skipgram String Embeddings 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans fastText Skipgram String Embeddings is a 300 dimensional Afrikaans embedding model based on the Skipgram fastText architecture that ...
  • CTexT Afrikaans GloVe Word Embeddings 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans GloVe Word Embeddings is a 300 dimensional Afrikaans embedding model based on the Global Vectors architecture (Pennington, 2014) ...
  • CTexT Afrikaans FLAIR String Embeddings 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans FLAIR String Embeddings are two Afrikaans embedding models based on the FLAIR architecture (Akbik et al. 2018, 2019) that provides ...
  • CTexT Afrikaans FLAIR Named Entity Recognition model 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans FLAIR Named Entity Recognition model is a neural NER model based on the FLAIR framework (Akbik et al. 2019), and includes Afrikaans ...
  • CTexT Afrikaans fastText CBoW String Embeddings 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans fastText CBoW String Embeddings is a 300 dimensional Afrikaans embedding model based on the Contunious Bag of Words fastText ...
  • CTexT Afrikaans FLAIR Part of Speech tagger model 

    Eiselen, Roald (Centre for Text Technology (CTexT), 2022-01-10)
    The CTexT Afrikaans FLAIR Part of Speech tagger model is a neural part of speech tagger model based on the FLAIR framework (Akbik et al. 2019), and ...
  • Core technologies for conjunctively written South African languages 

    Du Toit, Jaco, et al. (North-West University, Centre for Language Technology (CTexT), 2021-03-31)
    During this SADiLaR funded project, enriched corpora for the four official South African languages with a conjunctive orthography, i.e. isiNdebele ...

View more