NCHLT Speech II Corpus
Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/273'
dc.contact.email | KCalteaux@csir.co.za | |
dc.contact.name | Karen Calteaux | |
dc.contributor.author | Jaco Badenhorst | |
dc.contributor.author | Febe de Wet | |
dc.contributor.author | Neil Kleynhans | |
dc.contributor.author | Thipe Modipa | |
dc.contributor.other | Alfred Tshoane | |
dc.contributor.other | Georg Schlunz | |
dc.contributor.other | Stanly Ramunyisi | |
dc.contributor.other | Raymond Molapo | |
dc.contributor.other | Nic de Vries | |
dc.database | Monolingual Speech Corpora: Annotated | |
dc.date.accessioned | 2018-02-06T09:46:40Z | |
dc.date.accessioned | 2018-03-05T15:23:12Z | |
dc.date.available | 2018-02-06T09:46:40Z | |
dc.date.available | 2018-03-05T15:23:12Z | |
dc.date.issued | 2016-05-09 | |
dc.description | The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and transcriptions. The XML files provide the following metadata for each session: - audio filename - audio orthography - GOP (goodness of pronunciation) score - start time (seconds) - end time (seconds) The audio files are formatted as 16-bit Signed Integer PCM, single channel, and 16kHz sample rate. | |
dc.format.extent | 5.6 Gb | |
dc.format.medium | Text | |
dc.format.medium | 16 kHz | |
dc.format.medium | 16 bit | |
dc.format.medium | *.wav | |
dc.identifier.uri | https://hdl.handle.net/20.500.12185/273 | |
dc.language.iso | eng | |
dc.languages | English | |
dc.media.category | Monolingual speech corpora: Annotated | |
dc.media.type | Speech | |
dc.project | NCHLT Speech II | |
dc.publisher | Meraka Institute, CSIR | |
dc.rights.license | Creative Commons Attribution 3.0 South Africa (CC BY 3.0 ZA): http://creativecommons.org/licenses/by/3.0/za/ | |
dc.source | Audio recordings smartphone-collected in non-studio environment | |
dc.source | Text prompts from various sources, predominantly from .gov.za (web) | |
dc.title | NCHLT Speech II Corpus | |
dc.type | Data | |
dc.version | 1 | |
local.collection.primary | Resource Catalogue | |
local.collection.secondary | Resource Index |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- nchlt_speech_ii_corpus.zip
- Size:
- 4.34 GB
- Format:
- ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.