NCHLT Sepedi Auxiliary Speech Corpus

Febe de Wet; Laura Martinus; Jaco Badenhorst

Title	NCHLT Sepedi Auxiliary Speech Corpus
Description	The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in XML format.
Contact name	Karen Calteaux
Contact email	KCalteaux@csir.co.za
Publisher(s)	CSIR Meraka Institute; North-West University
License	Creative Commons Attribution 3.0 Unported (CC BY 3.0): https://creativecommons.org/licenses/by/3.0/legalcode
Language(s)	Sesotho sa Leboa (Sepedi)
Author(s)	Febe de Wet; Laura Martinus; Jaco Badenhorst
Contributor	Charl van Heerder; Etienne Barnard; Marelie Davel; Alta de Waal
Subject	Sepedi; Speech corpora; Transcribed
Citation	Jaco Badenhorst, Laura Martinus and Febe de Wet, "BLSTM harvesting of auxiliary NCHLT speech data", In Proceedings of SAUPEC/ROBMECH/PRASA 2019, Bloemfontein, South Africa, January 2019.; Etienne Barnard, Marelie H. Davel, Charl van Heerden, Febe de Wet and Jaco Badenhorst, "The NCHLT Speech Corpus of the South African languages", In Proc. 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), St Petersburg, Russia, May 2014.; Charl van Heerden, Marelie H. Davel and Etienne Barnard, "The semi-automated creation of stratified speech corpora", In Proc. Pattern Recognition Association of South Africa annual symposium (PRASA), Johannesburg, South Africa, Dec 2013, pp. 115-119.; N.J. de Vries, M.H. Davel, J. Badenhorst, W.D. Basson, F. de Wet, E. Barnard and A. de Waal, "A smartphone-based ASR data collection tool for under-resourced languages", Speech Communication, Volume 56, January 2014, pp. 119-131.; Marelie H. Davel, Charl van Heerden, and Etienne Barnard, "Validating Smartphone-Collected Speech Corpora", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 68-75.; C van Heerden, M.H. Davel and E. Barnard, "Medium-Vocabulary Speech Recognition for Under-Resourced Languages", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 146-151.; J. Badenhorst, A. De Waal and F. de Wet, "Quality measurements for mobile data collection in the developing world", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 139-145.
URI	https://hdl.handle.net/20.500.12185/518
Media type	Speech
Media category	Annotated Monolingual Speech Corpus
Format extent	Aux 1: 65:14:39 Aux 2: 52:05:19
Version	1
Format size	Aux 1: 4.99 GB, Aux 2: 4.0 GB
Format medium	N/A
Project	NCHLT Speech
Primary collection	Resource Catalogue
Secondary collection	Resource Index
ISO639 code	nso
Submit date	2019-07-17T07:20:37Z
Date available	2019-07-17T07:20:37Z
Date created	2019-06-01
Verification status	Level 0

Files in this item

Name:: nso-aux1.zip
Size:: 4.998Gb
Format:: application/zip
MD5:: 122f6716cb9f9871619d73de0a19cf5c

Download

Name:: nso-aux2.zip
Size:: 4.009Gb
Format:: application/zip
MD5:: 206baf3f2de58e86bf297f02f9835f93

Download

This item appears in the following Collection(s)

Resource Catalogue [350]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
Resource Index [414]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record

NCHLT Sepedi Auxiliary Speech Corpus

Files in this item

License agreement

License agreement

This item appears in the following Collection(s)