Show simple item record

Mburisano Covid-19 multilingual corpus
This corpus was created to aid development of the AwezaMed Covid-19 speech-to-speech mobile application. The project within which it was created, Mburisano, was funded by the Department of Sport, Arts and Culture (DSAC). A selection of English sentences was generated in consultation with medical domain experts, and these sentences were manually translated into all official South African languages. The sentences formed the basis of the rapid development of Grammatical Framework (GF) application grammars for all the languages, to aid spoken communication about Covid-19 with a particular focus on screening and triage. The corpus is presented as a limited domain, manually translated parallel corpus in all 11 official South African languages. The AwezaMed Covid-19 application can be found [here](
Laurette Marais
CSIR Voice Computing
Creative Commons Attribution 3.0 Unported (CC BY 3.0):
Afrikaans; English; isiNdebele; isiXhosa; isiZulu; Sepedi; Setswana; Sesotho; Siswati; Tshivenda; Xitsonga
Marais, Laurette
Wilken, Ilana; Van Niekerk, Nina; Calteaux, Karen
multilingual text corpus
283 x 11 utterances

Files in this item


This item appears in the following Collection(s)

  • Resource Index [412]
    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record