SADiLaR Language Resource Repository
Welcome to the Language Resource Management Agency of SADiLaR. This repository provides access to all of the collections, data sets, tools and other language resources that are distributed by SADiLaR.
The repository will eventually replace all of the functionality of the original RMA site, with all of the resources available from the RMA, also available from this repository.
Communities in SADiLaR
Select a community to browse its collections.
Recently Added
-
South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT)
(ICELDA; SADiLaR, 2021)The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ... -
Sesotho syllabification systems
(South African Centre for Digital Language Resources, 2022-02-03)This package contains two syllabification systems for Sesotho (rule-based and TeX-based). -
Sesotho syllable wordlist
(South African Centre for Digital Language Resources, 2022-02-03)This package contains a wordlist containing Sesotho words and their syllable information. -
CTexT fastText Skipgram String Embeddings
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans fastText Skipgram String Embeddings is a 300 dimensional Afrikaans embedding model based on the Skipgram fastText architecture that ... -
CTexT Afrikaans GloVe Word Embeddings
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans GloVe Word Embeddings is a 300 dimensional Afrikaans embedding model based on the Global Vectors architecture (Pennington, 2014) ... -
CTexT Afrikaans FLAIR String Embeddings
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans FLAIR String Embeddings are two Afrikaans embedding models based on the FLAIR architecture (Akbik et al. 2018, 2019) that provides ... -
CTexT Afrikaans FLAIR Named Entity Recognition model
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans FLAIR Named Entity Recognition model is a neural NER model based on the FLAIR framework (Akbik et al. 2019), and includes Afrikaans ... -
CTexT Afrikaans fastText CBoW String Embeddings
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans fastText CBoW String Embeddings is a 300 dimensional Afrikaans embedding model based on the Contunious Bag of Words fastText ... -
CTexT Afrikaans FLAIR Part of Speech tagger model
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans FLAIR Part of Speech tagger model is a neural part of speech tagger model based on the FLAIR framework (Akbik et al. 2019), and ... -
Core technologies for conjunctively written South African languages
(North-West University, Centre for Language Technology (CTexT), 2021-03-31)During this SADiLaR funded project, enriched corpora for the four official South African languages with a conjunctive orthography, i.e. isiNdebele ...