Search
Now showing items 41-50 of 60
Combination Tagger
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
The combination tagger framework uses MBT, SVM, MXPOST and TnT. Each tagger receives a weight by which it can vote for a tag.
Autshumato Multilingual Word and Phrase Translations
(North-West University; Centre for Text Technology (CTexT), 2016-01-20) ~ - Resource Catalogue
Word and phrase lists aligned from English to the other official South African languages.
NCHLT Siswati word2vec-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
Final year high school examination texts of South African home and first additional language subjects
(South African Centre for Digital Language Resources, 2022-11-16)
This data collection consists of reading comprehension and summary
writing texts. The texts comprise of the final year high school exam
texts for ...
NCHLT Part of Speech Taggers
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Part of speech taggers developed during the NCHLT Text project.
Available for the following languages: Afrikaans, English, isiNdebele, isiXhosa, isiZulu, ...
Linguistically enriched corpora for conjunctively written South African languages
(North-West University, Centre for Language Technology (CTexT), 2021-09)
This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family ...
NCHLT Siswati Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
NCHLT Siswati fastText-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...
NCHLT Siswati Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
PSearch 1.1.
(North-West University; Centre for Text Technology (CTexT); Tilburg Centre for Cognition and Communication, 2015-01-30) ~ - Resource Index
PSearch is based on Paramsearch, a tool created by Antal van den Bosch for automatic algorithmic parameter optimisation for TiMBL and other machine ...