Show simple item record

NCHLT Sepedi Auxiliary Speech Corpus
The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in XML format.
Karen Calteaux
KCalteaux@csir.co.za
CSIR Meraka Institute; North-West University
Creative Commons Attribution 3.0 Unported (CC BY 3.0): https://creativecommons.org/licenses/by/3.0/legalcode
Sesotho sa Leboa (Sepedi)
Febe de Wet; Laura Martinus; Jaco Badenhorst
Charl van Heerder; Etienne Barnard; Marelie Davel; Alta de Waal
Sepedi; Speech corpora; Transcribed
Jaco Badenhorst, Laura Martinus and Febe de Wet, "BLSTM harvesting of auxiliary NCHLT speech data", In Proceedings of SAUPEC/ROBMECH/PRASA 2019, Bloemfontein, South Africa, January 2019.; Etienne Barnard, Marelie H. Davel, Charl van Heerden, Febe de Wet and Jaco Badenhorst, "The NCHLT Speech Corpus of the South African languages", In Proc. 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), St Petersburg, Russia, May 2014.; Charl van Heerden, Marelie H. Davel and Etienne Barnard, "The semi-automated creation of stratified speech corpora", In Proc. Pattern Recognition Association of South Africa annual symposium (PRASA), Johannesburg, South Africa, Dec 2013, pp. 115-119.; N.J. de Vries, M.H. Davel, J. Badenhorst, W.D. Basson, F. de Wet, E. Barnard and A. de Waal, "A smartphone-based ASR data collection tool for under-resourced languages", Speech Communication, Volume 56, January 2014, pp. 119-131.; Marelie H. Davel, Charl van Heerden, and Etienne Barnard, "Validating Smartphone-Collected Speech Corpora", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 68-75.; C van Heerden, M.H. Davel and E. Barnard, "Medium-Vocabulary Speech Recognition for Under-Resourced Languages", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 146-151.; J. Badenhorst, A. De Waal and F. de Wet, "Quality measurements for mobile data collection in the developing world", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 139-145.
https://hdl.handle.net/20.500.12185/518
Speech
Annotated Monolingual Speech Corpus
Aux 1: 65:14:39 Aux 2: 52:05:19
1
Aux 1: 4.99 GB, Aux 2: 4.0 GB
N/A
NCHLT Speech
Resource Catalogue
Resource Index
nso
2019-07-17T07:20:37Z
2019-07-17T07:20:37Z
2019-06-01


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

  • Resource Catalogue [348]
    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
  • Resource Index [411]
    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record