Morphologically annotated corpus for Tshivenḓa
License agreement
By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.
Download
MD5: eda2e1b218ff14b95ee50ece9cd35e93
License agreement
By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.
MD5: 57f4a2adc4893ddae00e238e04b93302
License agreement
By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.
MD5: 741413fb547f2da0297497c530a9095c
License agreement
By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.
Collections
- Resource Catalogue [349]
Author(s)
Gaustad, Tanja
Metadata
Show full item recordDescription
NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project.
The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated.
The file for Tshivenḓa contains a total of 66,487 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used.
Contact person
T. GaustadContact person's e-mail address
tanja.gaustad@nwu.ac.zaPublisher(s)
Centre for Text Technology (CTexT)