Repository logoRepository logo
 

NCHLT Speech II Corpus

Loading...
Thumbnail Image

Date

2016-05-09

Authors

Jaco Badenhorst
Febe de Wet
Neil Kleynhans
Thipe Modipa

Journal Title

Journal ISSN

Volume Title

Publisher

Meraka Institute, CSIR

Abstract

Description

The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and transcriptions. The XML files provide the following metadata for each session: - audio filename - audio orthography - GOP (goodness of pronunciation) score - start time (seconds) - end time (seconds) The audio files are formatted as 16-bit Signed Integer PCM, single channel, and 16kHz sample rate.

Keywords

Citation

Verification status

Level 0