Resource Catalogue
Browse by
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
Recent Submissions
-
Autshumato Monolingual English Corpus
(CTexT® (Centre for Text Technology, North-West University), 2023-10-30)Monolingual corpus for South African English. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically ... -
African Wordnet version 1.0
(UNISA, 2022-09-20)Developed using the expand model with Princeton WordNet 3.1 as basis. Please see https://africanwordnet.wordpress.com/ for all details on the project. ... -
Ex Machina: Using NLP and statistical learning models to model eyewitness statements and choosing behaviour
(Sadilar, 2019-05-07)This curated database includes data from various of empirical studies where eyewitness statements and descriptions were collected. The original studies, ... -
Autshumato English-Tshivenḓa Parallel Corpora
(North-West University; Centre for Text Technology (CTexT), 2023-12-12)Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced ... -
Autshumato Monolingual Tshivenḓa Corpus
(North-West University; Centre for Text Technology (CTexT), 2023-12-12)Monolingual corpus for Tshivenḓa. The data is given as a single UTF-8 text file, with each segment on a newline. -
Morphologically annotated corpus for isiNdebele
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ... -
Morphologically annotated corpus for isiXhosa
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for isiZulu
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for Siswati
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for Sesotho
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...