Search
Now showing items 41-50 of 79
NCHLT Afrikaans Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
CTexT fastText Skipgram String Embeddings
(Centre for Text Technology (CTexT), 2022-01-10)
The CTexT Afrikaans fastText Skipgram String Embeddings is a 300 dimensional Afrikaans embedding model based on the Skipgram fastText architecture that ...
Lwazi II Afrikaans TTS Corpus
(Meraka Institute, CSIR; North-West University, 2015-11-20) ~ - Resource Catalogue
Orthographic and phonemically aligned transcriptions
AuCoPro Semantics Dataset
(North-West University; Centre for Text Technology (CTexT); CLiPS Research Center, University of Antwerp, Belgium, 2015-01-07) ~ - Resource Catalogue
The AuCoPro Semantics dataset serves for the automatic semantic analysis of compounds. It contains semantically annotated noun-noun compounds (NN) from ...
NCHLT Afrikaans word2vec-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
NCHLT Afrikaans FLAIR-backward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT)
(ICELDA; SADiLaR, 2021)
NOTE: THIS HAS BEEN SUPERSEDED. See
https://hdl.handle.net/20.500.12185/585
The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) ...
Autshumato English-Afrikaans Parallel Corpora
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Parallel corpora aligned on sentence level through a combination of automatic and manual alignment techniques. The parallel corpora were obtained from ...
Lwazi Afrikaans TTS corpus
(Meraka Institute, CSIR, 2013-03-27) ~ - Resource Catalogue
Orthographic and phonemically aligned transcriptions
Speect
(Meraka Institute, CSIR, 2013-07-15) ~ - Resource Catalogue
Speect is a multilingual text-to-speech (TTS) system. It offers a full TTS system (text analysis which decodes the text, and speech synthesis, which ...