Search
Now showing items 51-60 of 227
Autshumato Xitsonga Monolingual Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~ - Resource Catalogue
Xitsonga monolingual corpus as deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a newline.
Lwazi Sepedi TTS corpus
(Meraka Institute, CSIR, 2013-03-27) ~ - Resource Catalogue
Orthographic and phonemically aligned transcriptions
Lwazi Xitsonga TTS corpus
(Meraka Institute, CSIR, 2013-03-27) ~ - Resource Catalogue
Orthographic and phonemically aligned transcriptions
Autshumato English-Afrikaans Translation Memory
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Translation memory from English (EN-GB) to Afrikaans, in the government domain for use in the Autshumato ITE application.
Verbtone Sepedi
(University of the Witwatersrand, 2015-01-27) ~ - Resource Index
Recordings of sentences with verb structures, showing one or two high tones on differing morphological constituent.
Lwazi isiXhosa ASR corpus
(Meraka Institute, CSIR, 2013-04-02) ~ - Resource Catalogue
Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Autshumato English-Sesotho sa Leboa Translation Memory
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ - Resource Catalogue
Translation memory from English (EN-GB) to Sesotho sa Leboa, in the government domain for use in the Autshumato ITE application.
South African Multilingual Proper Names (Multipron) Corpus
(Molo Afrika Speech Technologies, 2013-10-03) ~ - Resource Catalogue
Audio, orthographic and auditory verified broad phonemic transcriptions of proper names in four languages, produced by speakers of the same four languages.
Afrikaans Part of Speech Data
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
POS annotated data used to train POS tagger. The tagset was specifically designed for Afrikaans and consists of 139 pos-tags.
NCHLT Tshivenda Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...