NCHLT Speech II Corpus
Title | NCHLT Speech II Corpus |
Description | The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and transcriptions. The XML files provide the following metadata for each session: - audio filename - audio orthography - GOP (goodness of pronunciation) score - start time (seconds) - end time (seconds) The audio files are formatted as 16-bit Signed Integer PCM, single channel, and 16kHz sample rate. |
Contact name | Karen Calteaux |
Contact email | KCalteaux@csir.co.za |
Publisher(s) | Meraka Institute, CSIR |
License | Creative Commons Attribution 3.0 South Africa (CC BY 3.0 ZA): http://creativecommons.org/licenses/by/3.0/za/ |
Language(s) | English |
Author(s) | Jaco Badenhorst; Febe de Wet; Neil Kleynhans; Thipe Modipa |
Contributor | Alfred Tshoane; Georg Schlunz; Stanly Ramunyisi; Raymond Molapo; Nic de Vries |
URI | https://hdl.handle.net/20.500.12185/273 |
Media type | Speech |
Type | Data |
Media category | Monolingual speech corpora: Annotated |
Format extent | 5.6 Gb |
Version | 1 |
Format medium | Text; 16 kHz; 16 bit; *.wav |
Project | NCHLT Speech II |
Source | Audio recordings smartphone-collected in non-studio environment; Text prompts from various sources, predominantly from .gov.za (web) |
Database | Monolingual Speech Corpora: Annotated |
Primary collection | Resource Catalogue |
Secondary collection | Resource Index |
ISO639 code | eng |
Submit date | 2018-02-06T09:46:40Z; 2018-03-05T15:23:12Z |
Date available | 2018-02-06T09:46:40Z; 2018-03-05T15:23:12Z |
Date created | 2016-05-09 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [350]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture. -
Resource Index [412]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.