Morphologically annotated corpus for Sesotho
Title | Morphologically annotated corpus for Sesotho |
Description | NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for Sesotho contains a total of 73,727 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used. |
Contact name | T. Gaustad |
Contact email | tanja.gaustad@nwu.ac.za |
Publisher(s) | Centre for Text Technology (CTexT) |
License | CC BY 4.0 |
Language(s) | Sesotho |
Author(s) | Gaustad, Tanja |
Contributor | McKellar, Cindy |
Subject | morphology; annotated |
URI | https://hdl.handle.net/20.500.12185/676 |
Media type | Text |
Media category | annotated text corpus |
Format extent | 73,727 tokens |
Version | 1.0 |
Format size | 2Mb |
Format medium | N/A |
Project | Linguistic corpus enrichment for South African languages |
Submit date | 2024-03-27T08:26:05Z |
Date available | 2024-03-27T08:26:05Z |
Date created | 2024-01-31 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [350]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.