Search
Now showing items 301-310 of 345
Monolingual isiXhosa corpus
(North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ - Resource Catalogue
Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline.
The dataset contains existing data ...
NCHLT Tshivenda Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
NCHLT isiZulu Morphological Decomposer
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Morphological decomposer developed during the NCHLT Text project.
NCHLT isiZulu word2vec-CBOW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides ...
Autshumato ITE
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Integrated Translation Environment. Combines multiple translation tools into one environment.
USAf National Language Resources Audit 2023
(South African Centre for Digital Language Resources, 2023-10)
This report documents the findings of a comprehensive language resources audit conducted by the South African Centre for Digital Language Resources ...
Sesotho sa Leboa Genre Classification Corpus
(Trifonius, 2013-06-19) ~ - Resource Catalogue
Contains training and testing data for Genre Classification for Sesotho sa Leboa.
isiNdebele Custom Dictionary for Government Domain
(North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ - Resource Catalogue
Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
isiNdebele Genre Classification Corpus
(Trifonius, 2013-06-19) ~ - Resource Catalogue
Contains training and testing data for Genre Classification for isiNdebele.
POS annotated corpus with 5 different text types for isiZulu
(Centre for Text Technology (CTexT), 2024-01-31)
This is a POS annotated corpus with 5 different text types for isiZulu.
The text types included are:
- CAPS gr12 (Academic) - https://www.educat ...