Linguistically enriched corpora for conjunctively written South African languages
Archive containing a Readme, a train folder with 5 text files and a test folder with (currently) a readme. The test files will be uploaded after the deadline of the DHASA 2021 shared task has passed. (1.133Mb)
- Resource Index 
MetadataShow full item record
This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family (isiNdebele, isiXhosa, isiZulu and Siswati) as well as English. The data set is parallel for all five languages and the Nguni languages have been annotated for three different types of linguistic information: morphology, part-of-speech and lemmas.
Contact personTanja Gaustad
Contact person's e-mail firstname.lastname@example.org
North-West University, Centre for Language Technology (CTexT)