NCHLT isiNdebele Auxiliary Speech Corpus

Febe de Wet; Laura Martinus; Jaco Badenhorst

NCHLT isiNdebele Auxiliary Speech Corpus

Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/513'

dc.contact.email	KCalteaux@csir.co.za
dc.contact.name	Karen Calteaux
dc.contributor.author	Febe de Wet
dc.contributor.author	Laura Martinus
dc.contributor.author	Jaco Badenhorst
dc.contributor.other	Charl van Heerder
dc.contributor.other	Etienne Barnard
dc.contributor.other	Marelie Davel
dc.contributor.other	Alta de Waal
dc.date.accessioned	2019-07-17T06:30:02Z
dc.date.available	2019-07-17T06:30:02Z
dc.date.issued	2019-06-01
dc.description	The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in XML format.
dc.format.extent	Aux 1: 42:45:06 Aux 2: 120:36:3
dc.format.medium	N/A
dc.format.size	Aux 1: 3.29 GB, Aux 2: 9.43 GB
dc.identifier.citation	Jaco Badenhorst, Laura Martinus and Febe de Wet, "BLSTM harvesting of auxiliary NCHLT speech data", In Proceedings of SAUPEC/ROBMECH/PRASA 2019, Bloemfontein, South Africa, January 2019.
dc.identifier.citation	Etienne Barnard, Marelie H. Davel, Charl van Heerden, Febe de Wet and Jaco Badenhorst, "The NCHLT Speech Corpus of the South African languages", In Proc. 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), St Petersburg, Russia, May 2014.
dc.identifier.citation	Charl van Heerden, Marelie H. Davel and Etienne Barnard, "The semi-automated creation of stratified speech corpora", In Proc. Pattern Recognition Association of South Africa annual symposium (PRASA), Johannesburg, South Africa, Dec 2013, pp. 115-119.
dc.identifier.citation	N.J. de Vries, M.H. Davel, J. Badenhorst, W.D. Basson, F. de Wet, E. Barnard and A. de Waal, "A smartphone-based ASR data collection tool for under-resourced languages", Speech Communication, Volume 56, January 2014, pp. 119-131.
dc.identifier.citation	Marelie H. Davel, Charl van Heerden, and Etienne Barnard, "Validating Smartphone-Collected Speech Corpora", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 68-75.
dc.identifier.citation	C van Heerden, M.H. Davel and E. Barnard, "Medium-Vocabulary Speech Recognition for Under-Resourced Languages", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 146-151.
dc.identifier.citation	J. Badenhorst, A. De Waal and F. de Wet, "Quality measurements for mobile data collection in the developing world", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 139-145.
dc.identifier.uri	https://hdl.handle.net/20.500.12185/513
dc.language.iso	nbl
dc.languages	isiNdebele
dc.media.category	Annotated Monolingual Speech Corpus
dc.media.type	Speech
dc.project	NCHLT Speech
dc.publisher	CSIR Meraka Institute
dc.publisher	North-West University
dc.rights.license	Creative Commons Attribution 3.0 Unported (CC BY 3.0): https://creativecommons.org/licenses/by/3.0/legalcode
dc.subject	isiNdebele	en_ZA
dc.subject	Speech corpora	en_ZA
dc.subject	Transcribed	en_ZA
dc.title	NCHLT isiNdebele Auxiliary Speech Corpus
dc.version	1
local.collection.primary	Resource Catalogue
local.collection.secondary	Resource Index

Files

Original bundle

Now showing 1 - 2 of 2

Name:: nbl-aux1.zip
Size:: 3.29 GB
Format:: ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.

Download

Name:: nbl-aux2.zip
Size:: 9.44 GB
Format:: ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.

Download

Collections

Resource Catalogue
Resource Index