Search

Now showing items 21-30 of 122

Autshumato Xitsonga Monolingual Corpora

Wikus Pienaar; Wildrich Fourie; Cindy McKellar (North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~ Resource Catalogue

Xitsonga monolingual corpus as deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a newline.

Autshumato English-Afrikaans Translation Memory

Cindy McKellar; Handré Groenewald (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue

Translation memory from English (EN-GB) to Afrikaans, in the government domain for use in the Autshumato ITE application.

Autshumato English-Sesotho sa Leboa Translation Memory

Cindy McKellar; Marissa Griesel; Handré Groenewald (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue

Translation memory from English (EN-GB) to Sesotho sa Leboa, in the government domain for use in the Autshumato ITE application.

Afrikaans Part of Speech Data

Unknown author (North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ Resource Index

POS annotated data used to train POS tagger. The tagset was specifically designed for Afrikaans and consists of 139 pos-tags.

NCHLT Tshivenda Phrase Chunk Annotated Corpus

S.L. Tshikota; M.E. Takalani; A. Nyoni; Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...

Autshumato English-Xitsonga Parallel Corpora

Wikus Pienaar; Wildrich Fourie; Cindy McKellar (North-West University; Centre for Text Technology (CTexT), 2014-12-11) ~ Resource Catalogue

Aligned English-Xitsonga parallel corpus. The data is given as two seperate UTF-8 text files; with each segment on a newline.

NCHLT Sesotho Annotated Text Corpora

Martin Puttkammer; Martin Schlemmer; Ruan Bekker (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

NCHLT English Text Corpora

Martin Puttkammer; Martin Schlemmer; Wikus Pienaar; Ruan Bekker (North-West University; Centre for Text Technology (CTexT), 2016-09-09) ~ Resource Catalogue

Collection consisting of a clean corpus, lexicon, frequency list and named-entity lists developed during the NCHLT Text project.

African Wordnet: isiZulu 1.0

African Wordnet Project (UNISA, 2017-06-20) ~ Resource Catalogue

Developed using the expand model with Princeton WordNet 2.0 as basis. Each wordnet contains synsets with at least the following fields:\nWord form (lemma; ...

isiZulu Custom Dictionary for Government Domain

Martin Puttkammer; Nico Oosthuizen; Wikus Pienaar (North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ Resource Catalogue

Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...

View previous page
1
2
3
4
5
6
. . .
13
View next page