Project: SADiLaR IV (Extension): Update and extension of linguistic resources and core technologies for South African languages Type: 56,903 tokens in Tshivenḓa (ve_ZA) annotated for part-of-speech for four different text types (Final version). Languages: Tshivenḓa (ve_ZA) Date: 2026-03-31 Version: 1.0 (Final) Description: This deliverable contains part-of-speech tagged data from four different text types for Tshivenḓa. The text types included are: - CAPS gr12 (Academic) - MA/PhD Theses (Academic) - News/Magazines (Non-Academic) - Novels (Fiction) The data is given as txt files where each line contains a token and the corresponding POS tag, tab separated. Each text type data file contains 11,000+ tokens, amounting to a total of 56,903 tokens for the language. Please see the included protocol for more details on the POS tags used. Please see Tanja Gaustad, Roald Eiselen, Cindy McKellar (2026). Extension of Linguistic Resources for South African Languages: Part-of-Speech Annotated Domain-Specific Data. Proceedings of the Seventh Workshop on Resources for African Indigenous Languages (RAIL) (collocated with LREC 2026) for more detailed information. Contents Language and text type | Tokens | --------------------------------------------------------------------------------------- Tshivenḓa CAPS gr12 (POS annotated) | 11,166 | Tshivenḓa MA/PhD Theses (POS annotated) | 11,525 | Tshivenḓa News/Magazines (POS annotated) | 22,355 | Tshivenḓa Novels (POS annotated) | 11,857 | Total Tshivenḓa: 56,903 --------------------------------------------------------------------------------------- SADiLaR website: https://sadilar.org _________________________________________________________________________________ Licence for final (v1.0) distribution: Creative Commons Attribution 4.0 International URL: http://creativecommons.org/licenses/by/4.0/ Attribute work to: CTexT® (Centre for Text Technology, North-West University), South Africa; SADiLaR (South African Centre for Digital Language Resources), South Africa. Attribute work to URL: http://humanities.nwu.ac.za/ctext https://sadilar.org