Search
Now showing items 21-30 of 87
Mburisano Covid-19 multilingual corpus
(CSIR Voice Computing, 2020-12-04)
This corpus was created to aid development of the AwezaMed Covid-19 speech-to-speech mobile application. The project within which it was created, ...
NCHLT isiNdebele fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
English-IsiNdebele Glossary of Medical Terms
(University of South Africa (UNISA), 2021-09-01)
This is the PhD project, where English-isiNdebele glossary of medical terms was compiled by a PhD candidate.
Spelt
(Translate.org.za, 2015-01-28) ~ - Resource Index
Spelt allows a linguist to classify surface forms of words. The word can be associated with a root form and with a word classification. The primary use ...
Multilingual Information Communication Technology Terminology List
(Terminology Coordination Section of the National Language Service, Department of Arts and Culture, 2017-03-03) ~ - Resource Index
132 English source terms with their equivalents in the ten other official South African languages. Originally initiated by the Department of Communications, ...
NCHLT isiNdebele Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...
Multilingual HIV/AIDS Terminology List
(Terminology Coordination Section of the National Language Service, Department of Arts and Culture, 2017-02-15) ~ - Resource Index
586 English source terms with their equivalents in the ten other official South African languages. The list was compiled in collaboration with subject ...
NCHLT isiNdebele RoBERTa language model
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ...
NCHLT isiNdebele Speech Corpus
(Meraka Institute, CSIR; North-West University, 2014-07-08) ~ - Resource Catalogue
Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
Multilingual Natural Sciences & Technology Terminology List (Grade 4 - 6)
(Terminology Coordination Section of the National Language Service, Department of Arts and Culture, 2017-03-03) ~ - Resource Index
2756 English source terms with their equivalents in the ten other official South African languages. The list was populated from terms excerpted from ...