-------------------------------------------------------------------------------- README: AwezaMed ASR test data -------------------------------------------------------------------------------- Full name: AwezaMed automatic speech recognition (ASR) test data First Release: December 2020 Version: 1.0 Description: Broadband test speech prompts collected in the real-world environment to evaluate automatic speech recognition (ASR) accuracy for AwezaMed application deployment. See "Detailed information" for more information. Recoding set: Size Duration # of Speakers Lanaguages (ISO) ------------------------------------------------------------------------------- Background 38.1 MB 00:05:47 3 eng Symptoms 91.2 MB 00:13:15 5 eng Patient 149.9 MB 00:42:11 10 afr, xho, zul Size: 78.6 MB Duration: 01:01:13 (Hours:Minutes:Seconds) This data is shared under Creative Commons Attribution 3.0 Unported (CC BY 3.0). For more information see LICENSE.txt When using this corpus, please cite: Laurette Marais, Johannes A. Louw, Jaco Badenhorst, Karen Calteaux, Ilana Wilken, Nina van Niekerk, Glenn Stein, "AwezaMed: A Multilingual, Multimodal Speech-To-Speech Translation Application for Maternal Health Care", in IEEE 23rd International Conference on Information Fusion (FUSION), Rustenburg, South Africa, July 2020, pp. 1-8. -------------------------------------------------------------------------------- DETAILED INFORMATION -------------------------------------------------------------------------------- The corpus contains orthographically transcribed broadband speech in four official languages of South Africa: English, Afrikaans, isiXhosa and isiZulu. Respondents read a number of 10 or 20 ASR prompts in the realworld environment. The following three key environments were identified for audio recording: Environment | Description | Types of | Expected level of noise number | | consultations | -------------------------------------------------------------------------------- 1 | Single consultation | Antenatal | Minimal background noise. | room with bricks | Consent for | | and doors. | operations | -------------------------------------------------------------------------------- 2 | Two beds with brick | Antenatal | One other possible | walls and curtains | | consultation. | and/or drywall | | | separations. | | -------------------------------------------------------------------------------- 3 | Four to ten bed | Labour | Significant background | wards with curtain | Postnatal | noise (babies and | separations. | Neonatal | equipment). Conversations | | | in close proximity. -------------------------------------------------------------------------------- Individual recordings are provided in WAVE format (16-bit, mono, PCM sampled at 16kHz). The file structure is subdivided into three main directories. The ASR prompt lists from which the phrases was recorded is provided (asr_prompts). The files "background.txt" and "symptoms.txt" contains 10 and 20 English phrase prompts respectively. The English prompts includs typical phrases health care practitioners would used in AwezaMed and the recordings reside for different speakers reside in the background and symptoms main directories respectively. Meta data, such as the , , Environment and is provided in the sub-directory structure. A complete summary of all meta data across all sub-directories are also provided in each Summary_of_corpus_data.pdf file. The last patient main directory contains the recordings of 20 simple responses in three languages and therefore the sub-directories contain a identifier as well. -------------------------------------------------------------------------------- CORPUS DIRECTORY/FILE STRUCTURE: -------------------------------------------------------------------------------- ├── asr_prompts | ├── background.txt │ ├── symptoms.txt │ ├── afr_patient.txt │ ├── xho_patient.txt │ └── zul_patient.txt │ ├── background │ ├── __Environment___10_phrases │ ├── Summary_of_corpus_data.pdf │ │ ├── __.wav │ ... ├── symptoms │ ├── __Environment___20_phrases │ ├── Summary_of_corpus_data.pdf │ │ ├── __.wav │ ... ├── patient │ ├── _Environment___ │ ├── Summary_of_corpus_data.pdf │ │ ├── ____.wav ... -------------------------------------------------------------------------------- Voice Computing (VC) Research Group at the CSIR Nextgen Enterprises and Institutions (NGEI) --------------------------------------------------------------------------------