Search
Now showing items 1-10 of 238
NCHLT Sesotho Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
NCHLT Optical Character Recognition for South African Languages
(North-West University; Centre for Text Technology (CTexT), 2017-02-23) ~ - Resource Catalogue
An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure ...
NCHLT Tshivenda Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
AuCoPro Splitting Dataset
(North-West University; Centre for Text Technology (CTexT); Tilburg Centre for Cognition and Communication, 2015-01-07) ~ - Resource Catalogue
The AuCoPro Splitting dataset contains compounds annotated with their compound boundaries and linking morphemes for Afrikaans and Dutch.
NCHLT isiNdebele FLAIR-backward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
NCHLT Sepedi Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
African Speech Technology isiZulu Text Corpus
(North-West University; Stellenbosch University; University of Transkei; University of Free State (Qwa-Qwa campus); Rhodes University; University of KwaZulu-Natal; University of Western Cape, 2015-01-07) ~ - Resource Catalogue
Monolingual text corpus developed during the African Speech Technology project.
Sesotho Custom Dictionary for Government Domain
(North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ - Resource Catalogue
Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
NCHLT Xitsonga FLAIR-forward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
NCHLT isiZulu GloVe embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embedding model based on the Global Vectors architecture (Pennington et al., 2014). The embeddings provide real-valued vector representations ...