Language Resource Management Agency
Browse by
In 2009, the South African National HLT Network (NHN) funded a technology audit that was conducted to form a clear profile of the research and development activities in the human language technology field in South Africa. This audit was used as the basis for the RMA Index, which is a list of South African resources with the relevant metadata (information such as developer details and specifications). Some of these resources are included in the RMA Catalogue, and are therefore available for download.
Collections in this community
-
Resource Catalogue [251]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture. -
Resource Index [410]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download. -
Student Data Repository [6]
A collection of language resources available as part of the output of post-graduate study programs
Recent Submissions
-
South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) version 2023-03
(ICELDA; SADiLaR, 2023-03)The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ... -
Autshumato Monolingual Setswana Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sesotho Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Sepedi Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Sepedi. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual isiZulu Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for isiZulu. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato Monolingual Afrikaans Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ... -
Autshumato English-Xitsonga Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Xitsonga. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Setswana Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Sesotho Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Sesotho. The data is given as two separate UTF-8 text files, with each aligned segment on a ... -
Autshumato English-Sepedi Paralle Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)Aligned parallel corpora for the language pair English-Sepedi. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. ...