Show simple item record

Morphologically annotated corpus for Tshivenḓa
NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for Tshivenḓa contains a total of 66,487 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used.
T. Gaustad
tanja.gaustad@nwu.ac.za
Centre for Text Technology (CTexT)
CC BY 4.0
Tshivenda
Gaustad, Tanja
McKellar, Cindy
morphology; annotated
https://hdl.handle.net/20.500.12185/673
Text
annotated text corpus
66,487 tokens
1.0
2Mb
N/A
Linguistic corpus enrichment for South African languages
2024-03-27T08:25:25Z
2024-03-27T08:25:25Z
2024-01-31


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

  • Resource Catalogue [349]
    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.

Show simple item record