Morphologically annotated corpus for isiNdebele

CC BY 4.0: https://creativecommons.org/licenses/by/4.0/deed.enGaustad, TanjaMcKellar, Cindy2024-03-272024-03-272024-01-31https://hdl.handle.net/20.500.12185/680NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for isiNdebele contains a total of approximately 42,335 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used.text42,335 tokensN/AmorphologyannotatedMorphologically annotated corpus for isiNdebele2Mb