Morphologically annotated corpus for Setswana
Title | Morphologically annotated corpus for Setswana |
Description | NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for Setswana contains a total of 72,609 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used. |
Contact name | T. Gaustad |
Contact email | tanja.gaustad@nwu.ac.za |
Publisher(s) | Centre for Text Technology (CTexT) |
License | CC BY 4.0 |
Language(s) | Setswana |
Author(s) | Gaustad, Tanja |
Contributor | McKellar, Cindy |
Subject | morphology; annotated |
URI | https://hdl.handle.net/20.500.12185/674 |
Media type | Text |
Media category | annotated text corpus |
Format extent | 72,609 tokens |
Version | 1.0 |
Format size | 2Mb |
Format medium | N/A |
Project | Linguistic corpus enrichment for South African languages |
Submit date | 2024-03-27T08:25:41Z |
Date available | 2024-03-27T08:25:41Z |
Date created | 2024-01-31 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [350]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.