Browsing Resource Catalogue by Title
Filter by:
Now showing items 135-154 of 349
-
Lwazi Xitsonga ASR corpus
(Meraka Institute, CSIR, 2013-04-02) ~Resource Catalogue Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems. -
Lwazi Xitsonga Pronunciation Dictionary
(Meraka Institute, CSIR, 2013-04-01) ~Resource Catalogue General phonemic pronunciations for frequently occurring words in SA languages. Dictionaries were developed to be practically usable for speech technology ... -
Lwazi Xitsonga TTS corpus
(Meraka Institute, CSIR, 2013-03-27) ~Resource Catalogue Orthographic and phonemically aligned transcriptions -
Monolingual isiXhosa corpus
(North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~Resource Catalogue Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data ... -
Morphologically annotated corpus for isiNdebele
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ... -
Morphologically annotated corpus for isiXhosa
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for isiZulu
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for Sepedi
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in Sepedi converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for Sesotho
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for Setswana
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for Siswati
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for Tshivenḓa
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Morphologically annotated corpus for Xitsonga
(Centre for Text Technology (CTexT), 2024-01-31)NCHLT corpus of morphologically annotated tokens in Xitsonga converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ... -
Multilingual Linguistic Terminology
(UNISA, 2022-09-20)Multilingual Linguistic Terminology Project Termbanks of Linguistic terminology for South African languages Version 1.0 https://linguistictermino ... -
NCHLT Afrikaans Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~Resource Catalogue Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ... -
NCHLT Siswati Morphological Decomposer
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~Resource Catalogue Morphological decomposer developed during the NCHLT Text project. -
NCHLT Afrikaans Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~Resource Catalogue Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project. -
NCHLT Afrikaans Auxiliary Speech Corpus
(CSIR Meraka Institute; North-West University, 2019-06-01) ~Resource Catalogue The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ... -
NCHLT Afrikaans fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ... -
NCHLT Afrikaans fastText-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...