SADiLaR Language Resource Repository
Welcome to the Language Resource Management Agency of SADiLaR. This repository provides access to all of the collections, data sets, tools and other language resources that are distributed by SADiLaR.
The repository will eventually replace all of the functionality of the original RMA site, with all of the resources available from the RMA, also available from this repository.
Communities in SADiLaR
Select a community to browse its collections.
Recently Added
-
Autshumato Monolingual Setswana Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sesotho Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sepedi Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sepedi. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual isiZulu Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for isiZulu. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Afrikaans Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato English-Xitsonga Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Xitsonga. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Setswana Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Sesotho Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Sesotho. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Sepedi Paralle Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Sepedi. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. ... -
Autshumato English-isiZulu Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-isiZulu. The data is given as two separate UTF-8 text files, with each aligned segment on a ...