Resource Catalogue

Browse by

A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.

Recent Submissions

African Wordnet version 1.0

Griesel, Marissa (UNISA, 2022-09-20)

Developed using the expand model with Princeton WordNet 3.1 as basis. Please see https://africanwordnet.wordpress.com/ for all details on the project. ...
Ex Machina: Using NLP and statistical learning models to model eyewitness statements and choosing behaviour

Nortje, Alicia, et al. (Sadilar, 2019-05-07)

This curated database includes data from various of empirical studies where eyewitness statements and descriptions were collected. The original studies, ...
Autshumato English-Tshivenḓa Parallel Corpora

McKellar, Cindy (North-West University; Centre for Text Technology (CTexT), 2023-12-12)

Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced ...
Autshumato Monolingual Tshivenḓa Corpus

McKellar, Cindy (North-West University; Centre for Text Technology (CTexT), 2023-12-12)

Monolingual corpus for Tshivenḓa. The data is given as a single UTF-8 text file, with each segment on a newline.
Morphologically annotated corpus for isiNdebele

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ...
Morphologically annotated corpus for isiXhosa

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for isiZulu

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Siswati

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Sesotho

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Sepedi

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Sepedi converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...

Resource Catalogue

Browse by

Recent Submissions

African Wordnet version 1.0 ﻿

Ex Machina: Using NLP and statistical learning models to model eyewitness statements and choosing behaviour ﻿

Autshumato English-Tshivenḓa Parallel Corpora ﻿

Autshumato Monolingual Tshivenḓa Corpus ﻿

Morphologically annotated corpus for isiNdebele ﻿

Morphologically annotated corpus for isiXhosa ﻿

Morphologically annotated corpus for isiZulu ﻿

Morphologically annotated corpus for Siswati ﻿

Morphologically annotated corpus for Sesotho ﻿

Morphologically annotated corpus for Sepedi ﻿

African Wordnet version 1.0

Ex Machina: Using NLP and statistical learning models to model eyewitness statements and choosing behaviour

Autshumato English-Tshivenḓa Parallel Corpora

Autshumato Monolingual Tshivenḓa Corpus

Morphologically annotated corpus for isiNdebele

Morphologically annotated corpus for isiXhosa

Morphologically annotated corpus for isiZulu

Morphologically annotated corpus for Siswati

Morphologically annotated corpus for Sesotho

Morphologically annotated corpus for Sepedi