Show simple item record

NCHLT Speech II Corpus
The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and transcriptions. The XML files provide the following metadata for each session: - audio filename - audio orthography - GOP (goodness of pronunciation) score - start time (seconds) - end time (seconds) The audio files are formatted as 16-bit Signed Integer PCM, single channel, and 16kHz sample rate.
Karen Calteaux
Meraka Institute, CSIR
Creative Commons Attribution 3.0 South Africa (CC BY 3.0 ZA):
Jaco Badenhorst; Febe de Wet; Neil Kleynhans; Thipe Modipa
Alfred Tshoane; Georg Schlunz; Stanly Ramunyisi; Raymond Molapo; Nic de Vries
Monolingual speech corpora: Annotated
5.6 Gb
Text; 16 kHz; 16 bit; *.wav
Audio recordings smartphone-collected in non-studio environment; Text prompts from various sources, predominantly from (web)
Monolingual Speech Corpora: Annotated
Resource Catalogue
Resource Index
2018-02-06T09:46:40Z; 2018-03-05T15:23:12Z
2018-02-06T09:46:40Z; 2018-03-05T15:23:12Z

Files in this item


This item appears in the following Collection(s)

  • Resource Catalogue [349]
    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
  • Resource Index [411]
    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record