Creative Commons Attribution 4.0 InternationalGaustad, TanjaMcKellar, CindyGent, Sunny2026-03-262026-03-262026-03-31https://hdl.handle.net/20.500.12185/701This deliverable contains part-of-speech tagged data from five different text types for isiZulu. The text types included are: - CAPS gr12 (Academic) - MA/PhD Theses (Academic) - Magazines (Non-Academic) - News (Non-Academic) - Novels (Fiction) The data is given as txt files where each line contains a token and the corresponding POS tag, tab separated. Each text type data file contains 11,000+ tokens, amounting to a total of 67,875 tokens for the language. Please see the included protocol for more details on the POS tags used. This data is a combination of new data with the previously published smaller data set "POS annotated corpus with 5 different text types for isiZulu" https://hdl.handle.net/20.500.12185/671. Please see Tanja Gaustad, Roald Eiselen, Cindy McKellar (2026). Extension of Linguistic Resources for South African Languages: Part-of-Speech Annotated Domain-Specific Data. Proceedings of the Seventh Workshop on Resources for African Indigenous Languages (RAIL) (collocated with LREC 2026) for more detailed information.text67875 tokensN/AisiZulu, POS annotated, domain-specific, annotated corpusisiZulu Domain corpus POS annotated (5 domains)1 Mb