Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/674'
Morphologically annotated corpus for Setswana
Loading...
Deposit Licenses
Date
2024-01-31
Authors
Gaustad, Tanja
Journal Title
Journal ISSN
Volume Title
Publisher
Centre for Text Technology (CTexT)
Abstract
Description
NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project.
The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated.
The file for Setswana contains a total of 72,609 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used.
Keywords
Citation
License
CC BY 4.0
Collections
Verification status
Level 0