Repository logoRepository logo
 

Morphologically annotated corpus for Xitsonga

Loading...
Thumbnail Image

Deposit Licenses

Date

2024-01-31

Authors

Gaustad, Tanja

Journal Title

Journal ISSN

Volume Title

Publisher

Centre for Text Technology (CTexT)

Abstract

Description

NCHLT corpus of morphologically annotated tokens in Xitsonga converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for Xitsonga contains a total of 69,584 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used.

Citation

License

CC BY 4.0

Verification status

Level 0