Search
Now showing items 21-30 of 41
NCHLT Xitsonga Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
NCHLT Xitsonga Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project. \n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
Human Language Technology Audit 2017/18
(CSIR, 2018-08-31)
This document reports on all work conducted in the 2017/18 Audit of human language technology (HLT) resources available in South Africa project. The ...
NCHLT Xitsonga RoBERTa language model
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ...
NCHLT Xitsonga word2vec-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
Xitsonga Genre Classification Corpus
(Trifonius, 2013-06-19) ~ - Resource Catalogue
Contains training and testing data for Genre Classification for Xitsonga.
Autshumato English-Xitsonga Manually Translated Parallel Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~ - Resource Catalogue
Aligned English-Xitsonga parallel corpus. The data is given as two seperate UTF-8 text files; with each segment on a newline.
Autshumato TMX Integrator
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ - Resource Catalogue
Utility to merge multiple translation memories over a network using Subversion
Autshumato Multilingual Word and Phrase Translations
(North-West University; Centre for Text Technology (CTexT), 2016-01-20) ~ - Resource Catalogue
Word and phrase lists aligned from English to the other official South African languages.
NCHLT Part of Speech Taggers
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Part of speech taggers developed during the NCHLT Text project.
Available for the following languages: Afrikaans, English, isiNdebele, isiXhosa, isiZulu, ...