Browsing Resource Catalogue by Title

Filter by:

Now showing items 135-154 of 349

Lwazi Xitsonga ASR corpus

Charl van Heerden, et al. (Meraka Institute, CSIR, 2013-04-02) ~ Resource Catalogue

Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi Xitsonga Pronunciation Dictionary

Marelie Davel (Meraka Institute, CSIR, 2013-04-01) ~ Resource Catalogue

General phonemic pronunciations for frequently occurring words in SA languages. Dictionaries were developed to be practically usable for speech technology ...
Lwazi Xitsonga TTS corpus

Daniel van Niekerk, et al. (Meraka Institute, CSIR, 2013-03-27) ~ Resource Catalogue

Orthographic and phonemically aligned transcriptions
Monolingual isiXhosa corpus

McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue

Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data ...
Morphologically annotated corpus for isiNdebele

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ...
Morphologically annotated corpus for isiXhosa

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for isiZulu

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Sepedi

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Sepedi converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Sesotho

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Setswana

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Siswati

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Tshivenḓa

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Xitsonga

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Xitsonga converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Multilingual Linguistic Terminology

Griesel, Marissa (UNISA, 2022-09-20)

Multilingual Linguistic Terminology Project Termbanks of Linguistic terminology for South African languages Version 1.0 https://linguistictermino ...
NCHLT Afrikaans Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
NCHLT Siswati Morphological Decomposer

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Morphological decomposer developed during the NCHLT Text project.
NCHLT Afrikaans Annotated Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
NCHLT Afrikaans Auxiliary Speech Corpus

Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue

The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
NCHLT Afrikaans fastText-CBoW embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
NCHLT Afrikaans fastText-Skipgram embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...

View previous page
View next page