Morphologically annotated corpus for Tshivenḓa

Gaustad, Tanja

Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/673'

Morphologically annotated corpus for Tshivenḓa

Files

Protocol.SADiLaR.MorphologicalAnalysisTshivenda.Final.2024-01-31.doc (420.5 KB)

SADII-Ext.MorphDataNCHLTConverted.Final.2024-01-31.ve.txt (1.54 MB)

README.Morph.Final.2024-01-31.txt (2.4 KB)

Deposit Licenses

license.txt (3.22 KB)

Date

2024-01-31

Authors

Gaustad, Tanja

Publisher

Centre for Text Technology (CTexT)

Description

NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for Tshivenḓa contains a total of 66,487 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used.

Keywords

morphology, annotated

License

CC BY 4.0

URI

https://hdl.handle.net/20.500.12185/673

Collections

Resource Catalogue

Verification status

Level 0

Full item page

Morphologically annotated corpus for Tshivenḓa

Files

Deposit Licenses

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

License

URI

Collections

Verification status