Show simple item record

NCHLT-inlang Pronunciation Dictionaries
Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations for unseen words.
Karen Calteaux
Meraka Institute, CSIR; North-West University
Creative Commons Attribution 3.0 Unported License (CC BY 3.0):
Afrikaans; English; isiNdebele; isiXhosa; isiZulu; Sesotho sa Leboa (Sepedi); Setswana; Sesotho; Siswati; Tshivenda; Xitsonga
Marelie Davel
Charl van Heerden; Willem Basson; Simon Kemisho; Thipe Modipa; Mpho Kgampe; Etienne Barnard; Martin Puttkammer; various language practitioners from C-Trans (NWU); Translation World.
E. Barnard, M. H. Davel, C. van Heerden, F. de Wet and J. Badenhorst, "The NCHLT corpus of the South African languages", in Proc. SLTU, May 2014.
Pronunciation dictionaries
1.1 Mb
15,000 words per language
Text: UTF8, tab-delimited text Pronunciations: X-SAMPA Audio: 44,100 bps, 16-bit mono wav encoding
NCHLT Speech
15,000 generic words
Resource Catalogue
Resource Index
afr; eng; nbl; xho; zul; sot; nso; tsn; ssw; ven; tso
2018-02-05T20:18:41Z; 2018-03-05T17:48:03Z
2018-02-05T20:18:41Z; 2018-03-05T17:48:03Z

Files in this item


This item appears in the following Collection(s)

  • Resource Catalogue [349]
    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
  • Resource Index [412]
    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record