Search
Now showing items 201-210 of 240
Linguistically enriched corpora for conjunctively written South African languages
(North-West University, Centre for Language Technology (CTexT), 2021-09)
This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family ...
NCHLT isiNdebele Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
NCHLT Xitsonga Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
Multilingual Illustrated Dictionary with interactive games
(Centre for Text Technology (CTexT); Pharos Dictionaries, 2013-07-01) ~ - Resource Index
Multilingual Illustrated Dictionary with interactive games and pronunciation for 7 of SA's official languages
Pretoria Sepedi Corpus POS tagged
(Department of African Languages - University of Pretoria, 2015-01-27) ~ - Resource Index
The tagged Pretoria Sepedi Corpus for part-of-speech (POS) tagging. For grammtical anlysis morphological analysis , lexical , syntax
NCHLT Sesotho Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
NCHLT isiXhosa Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
Autshumato Monolingual Afrikaans Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
Bukantswe Sesotho-English Bilingual Dictionary
(North-West University, 2016-07-07) ~ - Resource Catalogue
Bilingual English-Sesotho dictionary. This dataset represents a basic Sesotho dictionary compiled in the creation of a Sesotho language resource. The ...
NCHLT Siswati Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...