Repository logoRepository logo
 

Mburisano Covid-19 multilingual corpus

dc.contact.emaillaurette.p@gmail.comen_ZA
dc.contact.nameLaurette Maraisen_ZA
dc.contributor.authorMarais, Laurette
dc.contributor.otherWilken, Ilana
dc.contributor.otherVan Niekerk, Nina
dc.contributor.otherCalteaux, Karen
dc.date.accessioned2021-03-02T09:39:39Z
dc.date.available2021-03-02T09:39:39Z
dc.date.issued2020-12-04
dc.descriptionThis corpus was created to aid development of the AwezaMed Covid-19 speech-to-speech mobile application. The project within which it was created, Mburisano, was funded by the Department of Sport, Arts and Culture (DSAC). A selection of English sentences was generated in consultation with medical domain experts, and these sentences were manually translated into all official South African languages. The sentences formed the basis of the rapid development of Grammatical Framework (GF) application grammars for all the languages, to aid spoken communication about Covid-19 with a particular focus on screening and triage. The corpus is presented as a limited domain, manually translated parallel corpus in all 11 official South African languages. The AwezaMed Covid-19 application can be found [here](https://play.google.com/store/apps/details?id=za.co.aweza.covid19&gl=ZA).en_ZA
dc.formatcsven_ZA
dc.format.extent283 x 11 utterancesen_ZA
dc.format.size150kBen_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/536
dc.languagesAfrikaansen_ZA
dc.languagesEnglishen_ZA
dc.languagesisiNdebeleen_ZA
dc.languagesisiXhosaen_ZA
dc.languagesisiZuluen_ZA
dc.languagesSepedien_ZA
dc.languagesSetswanaen_ZA
dc.languagesSesothoen_ZA
dc.languagesSiswatien_ZA
dc.languagesTshivendaen_ZA
dc.languagesXitsongaen_ZA
dc.media.categorymultilingual text corpusen_ZA
dc.media.typeTexten_ZA
dc.projectMburisanoen_ZA
dc.publisherCSIR Voice Computingen_ZA
dc.rights.licenseCreative Commons Attribution 3.0 Unported (CC BY 3.0): https://www.creativecommons.org/licenses/by/3.0/en_ZA
dc.subjectCovid-19en_ZA
dc.titleMburisano Covid-19 multilingual corpusen_ZA

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
mburisano_multilingual_corpus.csv
Size:
148.56 KB
Format:
Unknown data format
Description:
Loading...
Thumbnail Image
Name:
README.md
Size:
975 B
Format:
Unknown data format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.23 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections