Search
Now showing items 321-330 of 345
Autshumato English-Sesotho Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Aligned parallel corpora for the language pair English-Sesotho. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
Morphologically annotated corpus for Tshivenḓa
(Centre for Text Technology (CTexT), 2024-01-31)
NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project.
The data is ...
Morphologically annotated corpus for Xitsonga
(Centre for Text Technology (CTexT), 2024-01-31)
NCHLT corpus of morphologically annotated tokens in Xitsonga converted to the tags used during phases 1 and 2 of the SADiLaR-II project.
The data is ...
Morphologically annotated corpus for Setswana
(Centre for Text Technology (CTexT), 2024-01-31)
NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project.
The data is ...
Morphologically annotated corpus for Sepedi
(Centre for Text Technology (CTexT), 2024-01-31)
NCHLT corpus of morphologically annotated tokens in Sepedi converted to the tags used during phases 1 and 2 of the SADiLaR-II project.
The data is ...
Autshumato English-Tshivenḓa Parallel Corpora
(North-West University; Centre for Text Technology (CTexT), 2020-09-30)
Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced ...
Autshumato Monolingual Sesotho Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
CTexTools 2
(North-West University, Centre for Text Technology (CTexT); South African Department of Arts and Culture, 2018-05-24) ~ - Resource Catalogue
CTexTools is a corpus query and manipulation tool primarily for the official South African languages. The tool supports the creation of frequency and ...
Autshumato Machine Translation Evaluation Set
(North-West University; Centre for Text Technology (CTexT); Department of Arts and Culture, South Africa, 2017-12-15) ~ - Resource Catalogue
Comparable evaluation data for use in automatic machine translation evaluations. The evaluation set consists of 500 sentences translated separately by ...
Afrikaans text unit identification data
(Centre for Text Technology, North-West University, 2006) ~ - Resource Catalogue
This dataset was developed during a masters degree and used in the development of a text unit identifier capable of tagging sentences, named-entities, ...