Search
Now showing items 1-10 of 22
NCHLT Sepedi FLAIR-forward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
COVID-19 Multilingual Terminology
(City of Tshwane; South African Centre for Digital Language Resources (SADiLaR); Department of Science and Innovation; Pan South African Language Board (PanSALB), 2021-07)
COVID-19 multilingual terminology list document in all the South African languages. The development of this terminology list was initiated by City of ...
Mburisano Covid-19 multilingual corpus
(CSIR Voice Computing, 2020-12-04)
This corpus was created to aid development of the AwezaMed Covid-19 speech-to-speech mobile application. The project within which it was created, ...
NCHLT Sepedi fastText-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...
Autshumato English-Setswana Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Aligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
NCHLT Sepedi fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
Autshumato English-Sepedi Parallel Corpora
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Aligned parallel corpora for the language pair English-Sepedi. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. ...
Human Language Technology Audit 2017/18
(CSIR, 2018-08-31)
This document reports on all work conducted in the 2017/18 Audit of human language technology (HLT) resources available in South Africa project. The ...
SPCS Speech Corpus
(Council for Scientific and Industrial Research; North-West University, 2015-11-25) ~ - Resource Catalogue
Broadband speech corpus of approximately 10 hours and the corresponding transcriptions.
The development process of the corpus involved the recording ...
NCHLT Sepedi FLAIR-backward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...