Search

Now showing items 11-20 of 28

NCHLT Afrikaans Named Entity Annotated Corpus

Gerhard van Huyssteen; Martin Puttkammer; E.B. Trollip; J.C. Liversage; Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

Lwazi II Cross-lingual Proper Name Corpus

Marelie Davel; Mpho Kgampe (Meraka Institute, CSIR; North-West University, 2015-11-20) ~ Resource Catalogue

Prompted audio recordings of personal names in different languages, produced by 20 speakers with different language backgrounds.

Afrikaans Genre Classification Corpus

Gerhard van Huyssteen; D.P. Snyman (Trifonius, 2013-06-19) ~ Resource Catalogue

Contains training and testing data for Genre Classification for Afrikaans.

Lwazi Afrikaans Pronunciation Dictionary

Marelie Davel (Meraka Institute, CSIR, 2013-04-01) ~ Resource Catalogue

General phonemic pronunciations for frequently occurring words in SA languages. Dictionaries were developed to be practically usable for speech technology ...

South African Directory Enquiries (SADE) Name Corpus

Charl van Heerden; Marelie Davel; Oluwapelumi Giwa; J.W.F Thirion (North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC, 2015-09-07) ~ Resource Catalogue

"Audio and tagged orthographic transcriptions of South African names produced by first-language speakers of 4 languages: Afrikaans, English, isiZulu, ...

Autshumato Afrikaans-English Translation Memory

Cindy McKellar; Marissa Griesel; Handré Groenewald (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue

Translation memory from Afrikaans to English (EN-GB), in the government domain for use in the Autshumato ITE application.

NCHLT Afrikaans Annotated Text Corpora

Martin Puttkammer; Martin Schlemmer; Ruan Bekker (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

Lwazi II Afrikaans TTS Corpus

Daniel van Niekerk; Alta de Waal; Georg Schlünz (Meraka Institute, CSIR; North-West University, 2015-11-20) ~ Resource Catalogue

Orthographic and phonemically aligned transcriptions

Gerhard van Huyssteen; Walter Daelemans; Ben Verhoeven (North-West University; Centre for Text Technology (CTexT); CLiPS Research Center, University of Antwerp, Belgium, 2015-01-07) ~ Resource Catalogue

The AuCoPro Semantics dataset serves for the automatic semantic analysis of compounds. It contains semantically annotated noun-noun compounds (NN) from ...

Lwazi Afrikaans TTS corpus

Daniel van Niekerk; Etienne Barnard; Marelie Davel; Aby Louw; Alta de Waal (Meraka Institute, CSIR, 2013-03-27) ~ Resource Catalogue