Search

Now showing items 51-60 of 227

Autshumato Xitsonga Monolingual Corpora

Wikus Pienaar; Wildrich Fourie; Cindy McKellar (North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~ Resource Catalogue

Xitsonga monolingual corpus as deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a newline.

Lwazi Sepedi TTS corpus

Daniel van Niekerk; Etienne Barnard; Marelie Davel; Aby Louw; Alta de Waal (Meraka Institute, CSIR, 2013-03-27) ~ Resource Catalogue

Orthographic and phonemically aligned transcriptions

Lwazi Xitsonga TTS corpus

Daniel van Niekerk; Etienne Barnard; Marelie Davel; Aby Louw; Alta de Waal (Meraka Institute, CSIR, 2013-03-27) ~ Resource Catalogue

Orthographic and phonemically aligned transcriptions

Autshumato English-Afrikaans Translation Memory

Cindy McKellar; Handré Groenewald (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue

Translation memory from English (EN-GB) to Afrikaans, in the government domain for use in the Autshumato ITE application.

Verbtone Sepedi

Unknown author (University of the Witwatersrand, 2015-01-27) ~ Resource Index

Recordings of sentences with verb structures, showing one or two high tones on differing morphological constituent.

Lwazi isiXhosa ASR corpus

Charl van Heerden; Etienne Barnard; Jaco Badenhorst; Marelie Davel (Meraka Institute, CSIR, 2013-04-02) ~ Resource Catalogue

Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.

Autshumato English-Sesotho sa Leboa Translation Memory

Cindy McKellar; Marissa Griesel; Handré Groenewald (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue

Translation memory from English (EN-GB) to Sesotho sa Leboa, in the government domain for use in the Autshumato ITE application.

South African Multilingual Proper Names (Multipron) Corpus

Etienne Barnard; Marelie Davel; Oluwapelumi Giwa; Nadia Barnard; Jean-Pierre Martens; Derik Thirion (Molo Afrika Speech Technologies, 2013-10-03) ~ Resource Catalogue

Audio, orthographic and auditory verified broad phonemic transcriptions of proper names in four languages, produced by speakers of the same four languages.

Afrikaans Part of Speech Data

Unknown author (North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ Resource Index

POS annotated data used to train POS tagger. The tagset was specifically designed for Afrikaans and consists of 139 pos-tags.

NCHLT Tshivenda Phrase Chunk Annotated Corpus

S.L. Tshikota; M.E. Takalani; A. Nyoni; Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...

View previous page
1
. . .
3
4
5
6
7
8
9
. . .
23
View next page