Search
Now showing items 61-70 of 345
Autshumato English-Sesotho sa Leboa Translation Memory
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Translation memory from English (EN-GB) to Sesotho sa Leboa, in the government domain for use in the Autshumato ITE application.
COVID-19 Multilingual Terminology
(City of Tshwane; South African Centre for Digital Language Resources (SADiLaR); Department of Science and Innovation; Pan South African Language Board (PanSALB), 2021-07)
COVID-19 multilingual terminology list document in all the South African languages. The development of this terminology list was initiated by City of ...
Autshumato Monolingual Setswana Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
Lara2
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Tool for annotating texts with lemma, part of speech and morphological analysis information
NCHLT isiXhosa Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
Afrikaans Part of Speech Data
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
POS annotated data used to train POS tagger. The tagset was specifically designed for Afrikaans and consists of 139 pos-tags.
NCHLT Tshivenda Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
Autshumato English-Xitsonga Parallel Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-12-11) ~ - Resource Catalogue
Aligned English-Xitsonga parallel corpus. The data is given as two seperate UTF-8 text files; with each segment on a newline.
NCHLT Sesotho Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
NCHLT English Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2016-09-09) ~ - Resource Catalogue
Collection consisting of a clean corpus, lexicon, frequency list and named-entity lists developed during the NCHLT Text project.