Morphologically annotated corpus for Tshivenḓa

Title	Morphologically annotated corpus for Tshivenḓa
Description	NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for Tshivenḓa contains a total of 66,487 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used.
Contact name	T. Gaustad
Contact email	tanja.gaustad@nwu.ac.za
Publisher(s)	Centre for Text Technology (CTexT)
License	CC BY 4.0
Language(s)	Tshivenda
Author(s)	Gaustad, Tanja
Contributor	McKellar, Cindy
Subject	morphology; annotated
URI	https://hdl.handle.net/20.500.12185/673
Media type	Text
Media category	annotated text corpus
Format extent	66,487 tokens
Version	1.0
Format size	2Mb
Format medium	N/A
Project	Linguistic corpus enrichment for South African languages
Submit date	2024-03-27T08:25:25Z
Date available	2024-03-27T08:25:25Z
Date created	2024-01-31

Resource Catalogue [349]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.