Search
Now showing items 221-230 of 240
isiNdebele Genre Classification Corpus
(Trifonius, 2013-06-19) ~ - Resource Catalogue
Contains training and testing data for Genre Classification for isiNdebele.
Autshumato English-Sesotho Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Aligned parallel corpora for the language pair English-Sesotho. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
Autshumato English-Tshivenḓa Parallel Corpora
(North-West University; Centre for Text Technology (CTexT), 2020-09-30)
Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced ...
Autshumato Monolingual Sesotho Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
CTexTools 2
(North-West University, Centre for Text Technology (CTexT); South African Department of Arts and Culture, 2018-05-24) ~ - Resource Catalogue
CTexTools is a corpus query and manipulation tool primarily for the official South African languages. The tool supports the creation of frequency and ...
Autshumato Machine Translation Evaluation Set
(North-West University; Centre for Text Technology (CTexT); Department of Arts and Culture, South Africa, 2017-12-15) ~ - Resource Catalogue
Comparable evaluation data for use in automatic machine translation evaluations. The evaluation set consists of 500 sentences translated separately by ...
Afrikaans text unit identification data
(Centre for Text Technology, North-West University, 2006) ~ - Resource Catalogue
This dataset was developed during a masters degree and used in the development of a text unit identifier capable of tagging sentences, named-entities, ...
Setswana Test suite and Treebank
(North-West University, 2018-03-27) ~ - Resource Catalogue
The main aim of the PhD study "A computational syntactic analysis of Setswana"(AS Berg, May 2018) is the computational syntactic analysis of the Setswana ...
Afrikaans lexical blends dataset
(North-West University, 2023-12)
This a dataset of Afrikaans blend constructions that have been collected and analysed using the Levenshtein distance metric. This dataset serves as the ...
Afribooms Afrikaans Dependency Treebank
(North-West University; Centre for Text Technology (CTexT); Katholieke Universiteit Leuven (Belgium), 2015-02-10) ~ - Resource Catalogue
This is the annotated corpus developed for Afrikaans for the Afribooms project. The corpus includes annotations for lemma, part-of-speech (POS) and ...