Repository logoRepository logo
 

Linguistically enriched corpora for conjunctively written South African languages

Loading...
Thumbnail Image

Deposit Licenses

Date

2021-09

Authors

Puttkammer, Martin
Gaustad, Tanja

Journal Title

Journal ISSN

Volume Title

Publisher

North-West University, Centre for Language Technology (CTexT)

Abstract

Description

This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family (isiNdebele, isiXhosa, isiZulu and Siswati) as well as English. The data set is parallel for all five languages and the Nguni languages have been annotated for three different types of linguistic information: morphology, part-of-speech and lemmas. We have also included the protocols and tagsets used during annotation.

Citation

https://doi.org/10.1016/j.dib.2022.107994

License

Collections

Verification status

Level 0