NCHLT Tshivenda Auxiliary Speech Corpus

Febe de Wet; Laura Martinus; Jaco Badenhorst

Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/516'

NCHLT Tshivenda Auxiliary Speech Corpus

Files

ven-aux1.zip (7.3 GB)

ven-aux2.zip (4.14 GB)

Date

2019-06-01

Authors

Febe de Wet

Laura Martinus

Jaco Badenhorst

Publisher

CSIR Meraka Institute
North-West University

Description

The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in XML format.

Keywords

Tshivenḓa, Speech corpora, Transcribed

Citation

Jaco Badenhorst, Laura Martinus and Febe de Wet, "BLSTM harvesting of auxiliary NCHLT speech data", In Proceedings of SAUPEC/ROBMECH/PRASA 2019, Bloemfontein, South Africa, January 2019.
Etienne Barnard, Marelie H. Davel, Charl van Heerden, Febe de Wet and Jaco Badenhorst, "The NCHLT Speech Corpus of the South African languages", In Proc. 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), St Petersburg, Russia, May 2014.
Charl van Heerden, Marelie H. Davel and Etienne Barnard, "The semi-automated creation of stratified speech corpora", In Proc. Pattern Recognition Association of South Africa annual symposium (PRASA), Johannesburg, South Africa, Dec 2013, pp. 115-119.
N.J. de Vries, M.H. Davel, J. Badenhorst, W.D. Basson, F. de Wet, E. Barnard and A. de Waal, "A smartphone-based ASR data collection tool for under-resourced languages", Speech Communication, Volume 56, January 2014, pp. 119-131.
Marelie H. Davel, Charl van Heerden, and Etienne Barnard, "Validating Smartphone-Collected Speech Corpora", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 68-75.
C van Heerden, M.H. Davel and E. Barnard, "Medium-Vocabulary Speech Recognition for Under-Resourced Languages", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 146-151.
J. Badenhorst, A. De Waal and F. de Wet, "Quality measurements for mobile data collection in the developing world", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 139-145.

License

Creative Commons Attribution 3.0 Unported (CC BY 3.0)

URI

https://hdl.handle.net/20.500.12185/516

Collections

Resource Catalogue
Resource Index

Verification status

Level 0

Full item page

NCHLT Tshivenda Auxiliary Speech Corpus

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

License

URI

Collections

Verification status