Search
Now showing items 21-30 of 41
Setswana Custom Dictionary for Government Domain
(North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ - Resource Catalogue
Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
NCHLT Setswana Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
NCHLT Setswana RoBERTa language model
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ...
NCHLT Setswana fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
NCHLT Setswana fastText-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...
Autshumato TMX Integrator
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ - Resource Catalogue
Utility to merge multiple translation memories over a network using Subversion
NCHLT Setswana word2vec-CBOW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides ...
Autshumato Multilingual Word and Phrase Translations
(North-West University; Centre for Text Technology (CTexT), 2016-01-20) ~ - Resource Catalogue
Word and phrase lists aligned from English to the other official South African languages.
NCHLT Setswana Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
NCHLT Part of Speech Taggers
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Part of speech taggers developed during the NCHLT Text project.
Available for the following languages: Afrikaans, English, isiNdebele, isiXhosa, isiZulu, ...