Search
Now showing items 41-50 of 240
Sepedi Grapheme-to-Phoneme Converter
(University of South Africa, 2015-01-28) ~ - Resource Index
Converting morphemes of Sesotho sa Leboa to phonological representations.
EWA
(WAT (Afrikaans NLU), 2013-07-01) ~ - Resource Index
Die Etimologiewoordeboek van die Afrikaanse Taal is an Afrikaans etymological dictionary, distributed in Folio Views for Windows 4.6.0.7
Autshumato Xitsonga Monolingual Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~ - Resource Catalogue
Xitsonga monolingual corpus as deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a newline.
Autshumato English-Afrikaans Translation Memory
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Translation memory from English (EN-GB) to Afrikaans, in the government domain for use in the Autshumato ITE application.
South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) version 2023-03
(ICELDA; SADiLaR, 2023-03)
The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ...
Autshumato English-Sesotho sa Leboa Translation Memory
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Translation memory from English (EN-GB) to Sesotho sa Leboa, in the government domain for use in the Autshumato ITE application.
Autshumato Monolingual Setswana Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
Lara2
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Tool for annotating texts with lemma, part of speech and morphological analysis information
NCHLT isiXhosa Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
Afrikaans Part of Speech Data
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
POS annotated data used to train POS tagger. The tagset was specifically designed for Afrikaans and consists of 139 pos-tags.