Search
Now showing items 141-150 of 238
NCHLT isiNdebele Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
Lwazi isiNdebele ASR corpus
(Meraka Institute, CSIR, 2013-04-02) ~ - Resource Catalogue
Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
African Speech Technology isiXhosa Text Corpus
(North-West University; Stellenbosch University; University of Transkei; University of Free State (Qwa-Qwa campus); Rhodes University; University of KwaZulu-Natal; University of Western Cape, 2015-01-07) ~ - Resource Catalogue
Monolingual text corpus developed during the African Speech Technology project.
Lwazi Sesotho Pronunciation Dictionary
(Meraka Institute, CSIR, 2013-04-01) ~ - Resource Catalogue
General phonemic pronunciations for frequently occurring words in SA languages. Dictionaries were developed to be practically usable for speech technology ...
Lwazi II English TTS Corpus
(Meraka Institute, CSIR; North-West University, 2015-11-20) ~ - Resource Catalogue
Orthographic and phonemically aligned transcriptions.
Setswana Custom Dictionary for Government Domain
(North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ - Resource Catalogue
Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
African Speech Technology English-English Speech Corpus
(North-West University; Stellenbosch University; University of Transkei; University of Free State (Qwa-Qwa campus); Rhodes University; University of KwaZulu-Natal; University of Western Cape, 2014-12-11) ~ - Resource Catalogue
African Speech Technology speech and transcription data for the English-English database. The "speech" directory contains English speech as spoken by ...
Lwazi Sesotho TTS corpus
(Meraka Institute, CSIR, 2013-03-27) ~ - Resource Catalogue
Orthographic and phonemically aligned transcriptions
NCHLT Afrikaans Annotated Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
NCHLT Setswana Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Lemmatiser developed during the NCHLT Text project.
\n\n
Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...