Repository logoRepository logo
 

NCHLT isiNdebele Auxiliary Speech Corpus

dc.contact.emailKCalteaux@csir.co.za
dc.contact.nameKaren Calteaux
dc.contributor.authorFebe de Wet
dc.contributor.authorLaura Martinus
dc.contributor.authorJaco Badenhorst
dc.contributor.otherCharl van Heerder
dc.contributor.otherEtienne Barnard
dc.contributor.otherMarelie Davel
dc.contributor.otherAlta de Waal
dc.date.accessioned2019-07-17T06:30:02Z
dc.date.available2019-07-17T06:30:02Z
dc.date.issued2019-06-01
dc.descriptionThe corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in XML format.
dc.format.extentAux 1: 42:45:06 Aux 2: 120:36:3
dc.format.mediumN/A
dc.format.sizeAux 1: 3.29 GB, Aux 2: 9.43 GB
dc.identifier.citationJaco Badenhorst, Laura Martinus and Febe de Wet, "BLSTM harvesting of auxiliary NCHLT speech data", In Proceedings of SAUPEC/ROBMECH/PRASA 2019, Bloemfontein, South Africa, January 2019.
dc.identifier.citationEtienne Barnard, Marelie H. Davel, Charl van Heerden, Febe de Wet and Jaco Badenhorst, "The NCHLT Speech Corpus of the South African languages", In Proc. 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), St Petersburg, Russia, May 2014.
dc.identifier.citationCharl van Heerden, Marelie H. Davel and Etienne Barnard, "The semi-automated creation of stratified speech corpora", In Proc. Pattern Recognition Association of South Africa annual symposium (PRASA), Johannesburg, South Africa, Dec 2013, pp. 115-119.
dc.identifier.citationN.J. de Vries, M.H. Davel, J. Badenhorst, W.D. Basson, F. de Wet, E. Barnard and A. de Waal, "A smartphone-based ASR data collection tool for under-resourced languages", Speech Communication, Volume 56, January 2014, pp. 119-131.
dc.identifier.citationMarelie H. Davel, Charl van Heerden, and Etienne Barnard, "Validating Smartphone-Collected Speech Corpora", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 68-75.
dc.identifier.citationC van Heerden, M.H. Davel and E. Barnard, "Medium-Vocabulary Speech Recognition for Under-Resourced Languages", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 146-151.
dc.identifier.citationJ. Badenhorst, A. De Waal and F. de Wet, "Quality measurements for mobile data collection in the developing world", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 139-145.
dc.identifier.urihttps://hdl.handle.net/20.500.12185/513
dc.language.isonbl
dc.languagesisiNdebele
dc.media.categoryAnnotated Monolingual Speech Corpus
dc.media.typeSpeech
dc.projectNCHLT Speech
dc.publisherCSIR Meraka Institute
dc.publisherNorth-West University
dc.rights.licenseCreative Commons Attribution 3.0 Unported (CC BY 3.0): https://creativecommons.org/licenses/by/3.0/legalcode
dc.subjectisiNdebele; Speech corpora; Transcribed
dc.titleNCHLT isiNdebele Auxiliary Speech Corpus
dc.version1
local.collection.primaryResource Catalogue
local.collection.secondaryResource Index

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
nbl-aux1.zip
Size:
3.29 GB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.
Loading...
Thumbnail Image
Name:
nbl-aux2.zip
Size:
9.44 GB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.