Repository logoRepository logo
 

Morphologically annotated corpus for Sepedi

dc.contact.emailtanja.gaustad@nwu.ac.zaen_ZA
dc.contact.nameT. Gaustaden_ZA
dc.contributor.authorGaustad, Tanja
dc.contributor.otherMcKellar, Cindy
dc.date.accessioned2024-03-27T08:25:54Z
dc.date.available2024-03-27T08:25:54Z
dc.date.issued2024-01-31
dc.descriptionNCHLT corpus of morphologically annotated tokens in Sepedi converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is given as txt files. Each line consists of a token and the corresponding morphological analysis, tab separated. The file for Sepedi contains a total of 73,031 tokens. All the data has been automatically converted, then manually checked and re-annotated where necessary by linguistic experts as well as quality controlled. Please see the included protocol for more details on the morphological tags used.en_ZA
dc.formattexten_ZA
dc.format.extent73,031 tokensen_ZA
dc.format.mediumN/Aen_ZA
dc.format.size2Mben_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/675
dc.languagesSepedien_ZA
dc.media.categoryannotated text corpusen_ZA
dc.media.typeTexten_ZA
dc.projectLinguistic corpus enrichment for South African languagesen_ZA
dc.publisherCentre for Text Technology (CTexT)en_ZA
dc.rights.licenseCC BY 4.0en_ZA
dc.subjectmorphologyen_ZA
dc.subjectannotateden_ZA
dc.titleMorphologically annotated corpus for Sepedien_ZA
dc.version1.0en_ZA

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
README.Morph.Final.2024-01-31.txt
Size:
2.4 KB
Format:
Plain Text
Description:
Read Me
Loading...
Thumbnail Image
Name:
Protocol.SADiLaR.MorphologicalAnalysisSepedi.Final.2023-08-30.doc
Size:
361 KB
Format:
Microsoft Word
Description:
Morphological Annotation Protocol for Sepedi
Loading...
Thumbnail Image
Name:
SADII-Ext.MorphDataNCHLTConverted.Final.2023-08-31.nso.txt
Size:
1.76 MB
Format:
Plain Text
Description:
Morphologically annotated corpus for Sepedi

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.22 KB
Format:
Item-specific license agreed upon to submission
Description: