Search
Now showing items 61-70 of 134
NCHLT isiNdebele Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
Tshivenda Custom Dictionary for Government Domain
(North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ - Resource Catalogue
Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
NCHLT Xitsonga Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
NCHLT Xitsonga Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project. \n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
isiXhosa Custom Dictionary for Government Domain
(North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ - Resource Catalogue
Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
Bilingual English-isiXhosa corpus
(North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ - Resource Catalogue
Aligned parallel corpora for the following language pair: English-isiXhosa.
The data is given as two separate UTF-8 text files, with each segment on a ...
NCHLT isiXhosa Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
Afrikaans Genre Classification Corpus
(Trifonius, 2013-06-19) ~ - Resource Catalogue
Contains training and testing data for Genre Classification for Afrikaans.
NCHLT isiZulu Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
Afrikaans speaking children's first lexical items
(North-West University, 2018-05-17) ~ - Resource Catalogue
Data collected for a master's study in Afrikaans linguistics. The data consist of the first lexical items of 21 Afrikaans speaking children. The lexical ...