Show simple item record

NCHLT Speech II Corpus
The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and transcriptions. The XML files provide the following metadata for each session: - audio filename - audio orthography - GOP (goodness of pronunciation) score - start time (seconds) - end time (seconds) The audio files are formatted as 16-bit Signed Integer PCM, single channel, and 16kHz sample rate.
Karen Calteaux
KCalteaux@csir.co.za
Meraka Institute, CSIR
Creative Commons Attribution 3.0 South Africa (CC BY 3.0 ZA): http://creativecommons.org/licenses/by/3.0/za/
English
Jaco Badenhorst; Febe de Wet; Neil Kleynhans; Thipe Modipa
Alfred Tshoane; Georg Schlunz; Stanly Ramunyisi; Raymond Molapo; Nic de Vries
https://hdl.handle.net/20.500.12185/273
Speech
Data
Monolingual speech corpora: Annotated
5.6 Gb
1
Text; 16 kHz; 16 bit; *.wav
NCHLT Speech II
Audio recordings smartphone-collected in non-studio environment; Text prompts from various sources, predominantly from .gov.za (web)
Monolingual Speech Corpora: Annotated
Resource Catalogue
Resource Index
eng
2018-02-06T09:46:40Z; 2018-03-05T15:23:12Z
2018-02-06T09:46:40Z; 2018-03-05T15:23:12Z
2016-05-09


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Resource Catalogue [349]
    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
  • Resource Index [411]
    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record