Search
Now showing items 1-10 of 24
Lwazi II Proper Name Call Routing Telephone Corpus
(Meraka Institute, CSIR; North-West University, 2015-11-20) ~ - Resource Catalogue
Short prompts of proper names and language names collected via the telephone network.
SAE Radio News Speech Corpus
(Stellenbosch University, 2015-01-27) ~ - Resource Index
News bulletins purchased from the SABC. Data to be used for the development of a large vocabulary continuous speech recognition system for South African ...
NCHLT-inlang Pronunciation Dictionaries
(Meraka Institute, CSIR; North-West University, 2014-07-04) ~ - Resource Catalogue
Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations ...
Lwazi English ASR corpus
(Meraka Institute, CSIR, 2013-04-02) ~ - Resource Catalogue
Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
South African Multilingual Proper Names (Multipron) Corpus
(Molo Afrika Speech Technologies, 2013-10-03) ~ - Resource Catalogue
Audio, orthographic and auditory verified broad phonemic transcriptions of proper names in four languages, produced by speakers of the same four languages.
Lwazi English Pronunciation Dictionary
(Meraka Institute, CSIR, 2013-04-01) ~ - Resource Catalogue
General phonemic pronunciations for frequently occurring words in SA languages. Dictionaries were developed to be practically usable for speech technology ...
SAE Pronunciation Dictionary
(Stellenbosch Universtity, 2015-01-27) ~ - Resource Index
Pronunciation dictionary compiled from newspaper text and radio news transcriptions. Dictionary to be used for the development of a large vocabulary ...
NCHLT Speech II Corpus
(Meraka Institute, CSIR, 2016-05-09) ~ - Resource Catalogue
The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and ...
South African Broadcast News (SABN) Corpus
(Stellenbosch University; CSIR, 2018-02-27) ~ - Resource Index
The corpus consists of approximately 20 hours of audio recordings from one of the country's main radio news channels, SAFM. Bulletins ...
Lwazi III English TTS Corpus
(Meraka Institute, CSIR, 2016-06-17) ~ - Resource Catalogue
Complete audio recordings with orthographic transcriptions. TTS corpus for standard SA dialect. This corpus was created to enable the building of a TTS voice.