Search
Now showing items 211-220 of 344
NCHLT Setswana Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
Siswati Genre Classification Corpus
(Trifonius, 2013-06-19) ~ - Resource Catalogue
Contains training and testing data for Genre Classification for Siswati.
NCHLT Sepedi GloVe embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embedding model based on the Global Vectors architecture (Pennington et al., 2014). The embeddings provide real-valued vector representations ...
Habakuk
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
Habakuk is a rule-based hyphenator for Afrikaans, which can be implemented in any NLP system. It takes as input a string, and produces as output an ...
Autshumato Monolingual isiNdebele Corpus
(North-West University; Centre for Text Technology (CTexT), 2021-01-31)
Monolingual corpus for isiNdebele. The data is given as a single UTF-8 text file, with each segment on a newline.
isiZulu Genre Classification Corpus
(Trifonius, 2013-06-19) ~ - Resource Catalogue
Contains training and testing data for Genre Classification for isiZulu.
NCHLT Xitsonga RoBERTa language model
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ...
Afrikaanse Speltoetser 3.1
(Centre for Text Technology (CTexT), 2013-07-01) ~ - Resource Index
Afrikaans spelling checker that is compatible with Microsoft Office 2000 and up. This version of CTexT's well-known spelling checker for Afrikaans now ...
Unisa isiXhosa Text Corpus
(University of South Africa, 2015-01-27) ~ - Resource Index
The main objective of the project was to develop a platform of computer supported basic linguistic resources for the 9 official African Languages of ...
NCHLT isiNdebele Named Entity Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.