Repository logoRepository logo
 

CSIR SAMA Speech Corpus Manual Datasets

dc.contact.emailjbadenhorst@csir.co.zaen_ZA
dc.contact.nameJaco Bandehorsten_ZA
dc.contributor.authorBandehorst, Jaco
dc.contributor.authorMak, Franco
dc.date.accessioned2025-02-13T13:50:52Z
dc.date.available2025-02-13T13:50:52Z
dc.date.issued2023-12
dc.descriptionThe evaluation corpus contains orthographically transcribed broadband speech in Afrikaans, isiXhosa, isiZulu, Sepedi, Sesotho, Tshivenḓa all part of South Africa’s eleven official written languages. The audio was harvested as MP3 podcasts and automatically segmented and transcribed. Segment transcriptions are provided in XML format. <br>Afrikaans <br>• News-20:04:15-Bulletins:377 <br>• Drama-12:35:47-Episodes: 351 <br>Sepedi <br>• Drama-10:25:16-Episodes: 321 <br>Sesotho <br>• News-14:49:11-Bulletins:326 <br>• Drama-10:01:17-Episodes: 200 <br>isiXhosa <br>• News-14:57:20-Bulletins: 325 <br>• Drama-09:41:42-Episodes: 190 <br>isiZulu <br>• News-15:55:14-Bulletins:349 <br>• Drama-07:58:10-Episodes: 124 <br>Tshivenḓa <br>• Drama-12:32:29-Episodes: 275en_ZA
dc.formatWAVen_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/689
dc.languagesAfrikaansen_ZA
dc.languagesisiXhosaen_ZA
dc.languagesisiZuluen_ZA
dc.languagesSepedien_ZA
dc.languagesSesothoen_ZA
dc.languagesTshivendaen_ZA
dc.media.categoryAnnotated Monolingual Speech Corpusen_ZA
dc.media.typeSpeechen_ZA
dc.publisherVoice Computing (VC) Research Group at the CSIR Nextgen Enterprises and Institutions (NGEI)en_ZA
dc.publisherSADiLaRen_ZA
dc.rights.licenseCreative Commons Attribution 4.0 International (CC-BY 4.0): https://www.creativecommons.org/licenses/by/4.0/en_ZA
dc.subjectSpeech corporaen_ZA
dc.subjectData harvestingen_ZA
dc.subjectTranscriptionen_ZA
dc.subjectSegmentationen_ZA
dc.titleCSIR SAMA Speech Corpus Manual Datasetsen_ZA
local.urlhttps://repo.sadilar.org/handle/20.500.12185/9/submit/6e66107b60294b642f8b2b1e317c71205c3e3034.continueen_ZA

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.22 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections