Search
Now showing items 11-20 of 60
Lara2
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Tool for annotating texts with lemma, part of speech and morphological analysis information
Mburisano Covid-19 multilingual corpus
(CSIR Voice Computing, 2020-12-04)
This corpus was created to aid development of the AwezaMed Covid-19 speech-to-speech mobile application. The project within which it was created, ...
NCHLT Siswati FLAIR-forward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
Monolingual Siswati Corpus
(North-West University - Centre for Text Technology (CTexT), 2022-03-31)
Monolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced ...
Multilingual Information Communication Technology Terminology List
(Terminology Coordination Section of the National Language Service, Department of Arts and Culture, 2017-03-03) ~ - Resource Index
132 English source terms with their equivalents in the ten other official South African languages. Originally initiated by the Department of Communications, ...
NCHLT Siswati word2vec-CBOW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides ...
Multilingual HIV/AIDS Terminology List
(Terminology Coordination Section of the National Language Service, Department of Arts and Culture, 2017-02-15) ~ - Resource Index
586 English source terms with their equivalents in the ten other official South African languages. The list was compiled in collaboration with subject ...
NCHLT Siswati Named Entity Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.
NCHLT Siswati Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
Multilingual Natural Sciences & Technology Terminology List (Grade 4 - 6)
(Terminology Coordination Section of the National Language Service, Department of Arts and Culture, 2017-03-03) ~ - Resource Index
2756 English source terms with their equivalents in the ten other official South African languages. The list was populated from terms excerpted from ...