Lagos-NWU Yoruba Speech Corpus
Title | Lagos-NWU Yoruba Speech Corpus |
Description | This speech corpus consisting of 16 female speakers and 17 male speakers was recorded in Lagos, Nigeria for the purpose of speech recognition research. Each speaker recorded about 130 utterances read from short texts selected for phonetic coverage. Recordings were done using a microphone connected to a laptop computer in a quiet office environment. |
Contact name | Jacques van Heerden |
Contact email | jacques.vanheerden@nwu.ac.za |
Publisher(s) | North-West University; Centre for Text Technology (CTexT); University of Lagos (Nigeria) |
License | Creative Commons Attribution 2.5 South Africa License: http://creativecommons.org/licenses/by/2.5/za/legalcode |
Language(s) | Yoruba |
Author(s) | Daniel van Niekerk; Etienne Barnard; Oluwapelumi Giwa; Azeez Sosimi |
URI | https://hdl.handle.net/20.500.12185/431 |
ISLRN | 573-526-122-515-8 |
Media type | Speech |
Type | Data |
Media category | Monolingual speech corpora: Annotated |
Format extent | 268 Mb (zipped) |
Version | 1 |
Format size | Number of speakers: 33, Number of utterances: 4316, Audio length: 165 mins. (including non-speech segments) Per speaker: approx. 130 utterances amounting to approx. 5 minutes of audio |
Format medium | UTF8; UTF-8 encoded Unicode text; RIFF-WAVE 16-bit PCM samples at 16kHz sampling rate |
Source | Web; Magazines; Literature and student reports; Audio recordings (normal office environment) |
Stratum | 16 female speakers and 17 male speakers recorded in Lagos, Nigeria |
Primary collection | Resource Catalogue |
Secondary collection | Resource Index |
ISO639 code | yor |
Submit date | 2018-02-05T20:20:56Z; 2018-03-05T17:51:10Z |
Date available | 2018-02-05T20:20:56Z; 2018-03-05T17:51:10Z |
Date created | 2015-02-06 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [349]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture. -
Resource Index [412]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.