Search

Now showing items 1-10 of 18

NCHLT Sepedi Annotated Text Corpora

Martin Puttkammer; Martin Schlemmer; Ruan Bekker (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

Lwazi II Sepedi TTS Corpus

Daniel van Niekerk; Georg Schlünz (Meraka Institute, CSIR; North-West University, 2015-11-20) ~ Resource Catalogue

Orthographic and phonemically aligned transcriptions.

Lwazi Sepedi Pronunciation Dictionary

Marelie Davel (Meraka Institute, CSIR, 2013-04-01) ~ Resource Catalogue

General phonemic pronunciations for frequently occurring words in SA languages. Dictionaries were developed to be practically usable for speech technology ...

NCHLT-inlang Pronunciation Dictionaries

Marelie Davel (Meraka Institute, CSIR; North-West University, 2014-07-04) ~ Resource Catalogue

Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations ...

Lwazi Sepedi TTS corpus

Daniel van Niekerk; Etienne Barnard; Marelie Davel; Aby Louw; Alta de Waal (Meraka Institute, CSIR, 2013-03-27) ~ Resource Catalogue

Orthographic and phonemically aligned transcriptions

Autshumato English-Sesotho sa Leboa Translation Memory

Cindy McKellar; Marissa Griesel; Handré Groenewald (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue

Translation memory from English (EN-GB) to Sesotho sa Leboa, in the government domain for use in the Autshumato ITE application.

Autshumato English-Sesotho sa Leboa Parallel Corpora

D.P. Snyman; Cindy McKellar; Handré Groenewald (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue

Parallel corpora aligned on sentence level through a combination of automatic and manual alignment techniques. The parallel corpora were obtained from ...

NCHLT Sepedi Text Corpora

Martin Puttkammer; Martin Schlemmer; Wikus Pienaar; Ruan Bekker (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...

NCHLT Sepedi Speech Corpus

Charl van Heerden; Etienne Barnard; Jaco Badenhorst; Marelie Davel; Alta de Waal (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue

Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

NCHLT Sepedi Named Entity Annotated Corpus

D.J. Prinsloo; Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

View previous page
1
2
View next page