--------------------------------------------------------------------------------
README: AwezaMed ASR test data
--------------------------------------------------------------------------------

Full name: AwezaMed automatic speech recognition (ASR) test data

First Release: December 2020

Version: 1.0

Description: Broadband test speech prompts collected in the real-world 
             environment to evaluate automatic speech recognition (ASR) 
             accuracy for AwezaMed application deployment. See 
             "Detailed information" for more information.

Recoding set:    Size           Duration     # of Speakers     Lanaguages (ISO)
-------------------------------------------------------------------------------        
Background       38.1 MB        00:05:47           3                  eng
Symptoms         91.2 MB        00:13:15           5                  eng
Patient          149.9 MB       00:42:11           10            afr, xho, zul
                          
Size:         78.6 MB
Duration:     01:01:13 (Hours:Minutes:Seconds)

This data is shared under Creative Commons Attribution 3.0 Unported (CC BY 3.0). 
For more information see LICENSE.txt

When using this corpus, please cite:

Laurette Marais, Johannes A. Louw, Jaco Badenhorst, Karen Calteaux, 
Ilana Wilken, Nina van Niekerk, Glenn Stein, "AwezaMed: A Multilingual, 
Multimodal Speech-To-Speech Translation Application for Maternal Health Care",
in IEEE 23rd International Conference on Information Fusion (FUSION), 
Rustenburg, South Africa, July 2020, pp. 1-8.

--------------------------------------------------------------------------------
DETAILED INFORMATION
--------------------------------------------------------------------------------

The corpus contains orthographically transcribed broadband speech in four 
official languages of South Africa: English, Afrikaans, isiXhosa and isiZulu. 
Respondents read a number of 10 or 20 ASR prompts in the realworld environment.

The following three key environments were identified for audio recording:

Environment |   Description       |    Types of      | Expected level of noise
   number   |                     |  consultations   |
--------------------------------------------------------------------------------
     1      | Single consultation | Antenatal        | Minimal background noise.
            | room with bricks    | Consent for      |
            | and doors.          | operations       |
--------------------------------------------------------------------------------
     2      | Two beds with brick |  Antenatal       | One other possible 
            | walls and curtains  |                  | consultation.
            | and/or drywall      |                  |
            | separations.        |                  |
-------------------------------------------------------------------------------- 
     3      | Four to ten bed     |  Labour          | Significant background 
            | wards with curtain  |  Postnatal       | noise (babies and 
            | separations.        |  Neonatal        | equipment). Conversations 
            |                     |                  | in close proximity.
--------------------------------------------------------------------------------

Individual recordings are provided in WAVE format (16-bit, mono, PCM sampled 
at 16kHz). The file structure is subdivided into three main directories.
The ASR prompt lists from which the phrases was recorded is provided 
(asr_prompts). The files "background.txt" and "symptoms.txt" contains 10 and 20
English phrase prompts respectively. The English prompts includs typical phrases 
health care practitioners would used in AwezaMed and the recordings reside for
different speakers reside in the background and symptoms main directories 
respectively. Meta data, such as the <location>, <date>, Environment <num> and
<spk_id> is provided in the sub-directory structure. A complete summary of
all meta data across all sub-directories are also provided in each 
Summary_of_corpus_data.pdf file. The last patient main directory contains the
recordings of 20 simple responses in three languages and therefore the 
sub-directories contain a <language> identifier as well.

--------------------------------------------------------------------------------
CORPUS DIRECTORY/FILE STRUCTURE:
--------------------------------------------------------------------------------
<data>
├── asr_prompts
|   ├── background.txt
│   ├── symptoms.txt
│   ├── afr_patient.txt
│   ├── xho_patient.txt
│   └── zul_patient.txt
│
├── background
│   ├── <location>_<date>_Environment_<num>_<spk_id>_10_phrases
│   ├── Summary_of_corpus_data.pdf
│   │   ├── <location_id>_<spk_id>_<words_of_prompt>.wav
│   ...
├── symptoms
│   ├── <location>_<date>_Environment_<num>_<spk_id>_20_phrases
│   ├── Summary_of_corpus_data.pdf
│   │   ├── <location_id>_<spk_id>_<words_of_prompt>.wav
│   ...
├── patient
│   ├── <location>_Environment_<num>_<language>_<spk_id>
│   ├── Summary_of_corpus_data.pdf
│   │   ├── <location_id>_<lang_id>_<spk_id>_<environment_id>_<words_of_prompt>.wav
   ...


--------------------------------------------------------------------------------
Voice Computing (VC) Research Group 
at the CSIR Nextgen Enterprises and Institutions (NGEI)
--------------------------------------------------------------------------------