Department of Science, Technology and InnovationCLARIN in South Africa

isiNdebele Domain corpus POS annotated (4 domains)

dc.contact.emailtanja.gaustad@nwu.ac.za
dc.contact.nameTanja Gaustad
dc.contributor.authorGaustad, Tanja
dc.contributor.otherMcKellar, Cindy
dc.contributor.otherGent, Sunny
dc.date.accessioned2026-03-26T12:58:18Z
dc.date.available2026-03-26T12:58:18Z
dc.date.issued2026-03-31
dc.descriptionThis deliverable contains part-of-speech tagged data from four different text types for isiNdebele. The text types included are: - CAPS gr12 (Academic) - MA/PhD Theses (Academic) - News/Magazines (Non-Academic) - Novels (Fiction) The data is given as .txt files where each line contains a token and the corresponding POS tag, tab separated. Each text type data file contains 12,000+ tokens, amounting to a total of 61,094 tokens for the language. Please see the included protocol for more details on the POS tags used. See Tanja Gaustad, Roald Eiselen, Cindy McKellar (2026). Extension of Linguistic Resources for South African Languages: Part-of-Speech Annotated Domain-Specific Data. Proceedings of the Seventh Workshop on Resources for African Indigenous Languages (RAIL) (collocated with LREC 2026) for more detailed information.
dc.formattext
dc.format.extent61094 tokens
dc.format.mediumN/A
dc.format.size1 Mb
dc.identifier.urihttps://hdl.handle.net/20.500.12185/704
dc.languagesisiNdebele
dc.media.categoryannotated domain-specific corpus
dc.media.typeText
dc.projectUpdate and extension of linguistic resources and core technologies for South African languages
dc.publisherNorth-West University - Centre for Text Technology (CTexT)
dc.rights.licenseCreative Commons Attribution 4.0 International
dc.subjectisiNdebele, POS annotated, domain-specific, annotated corpus
dc.titleisiNdebele Domain corpus POS annotated (4 domains)
dc.version1.0

Files

Original bundle

Now showing 1 - 5 of 6
Loading...
Thumbnail Image
Name:
Protocol.SADiLaR.PartOfSpeechTaggingIsiNdebele.Final.2026-03-31.docx
Size:
51.86 KB
Format:
Microsoft Word XML
Loading...
Thumbnail Image
Name:
README.POS.GenreData.Final.nr.2026-03-31.txt
Size:
2.15 KB
Format:
Plain Text
Loading...
Thumbnail Image
Name:
SAD-IV.Caps.POS.2026-03-23.nr.txt
Size:
213.04 KB
Format:
Plain Text
Loading...
Thumbnail Image
Name:
SAD-IV.NewsMagazines.POS.2026-03-23.nr.txt
Size:
262.07 KB
Format:
Plain Text
Loading...
Thumbnail Image
Name:
SAD-IV.Novels.POS.2026-03-23.nr.txt
Size:
231.73 KB
Format:
Plain Text

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.22 KB
Format:
Item-specific license agreed upon to submission
Description: