Morphologically annotated corpus for Xitsonga
Title | Morphologically annotated corpus for Xitsonga |
Description | NCHLT corpus of morphologically annotated tokens in Xitsonga converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for Xitsonga contains a total of 69,584 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used. |
Contact name | T. Gaustad |
Contact email | tanja.gaustad@nwu.ac.za |
Publisher(s) | Centre for Text Technology (CTexT) |
License | CC BY 4.0 |
Language(s) | Xitsonga |
Author(s) | Gaustad, Tanja |
Contributor | McKellar, Cindy |
Subject | morphology; annotated |
URI | https://hdl.handle.net/20.500.12185/672 |
Media type | Text |
Media category | annotated text corpus |
Format extent | 69,584 tokens |
Version | 1.0 |
Format size | 2Mb |
Format medium | N/A |
Project | Linguistic corpus enrichment for South African languages |
Submit date | 2024-03-27T08:25:13Z |
Date available | 2024-03-27T08:25:13Z |
Date created | 2024-01-31 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [350]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.