Department of Science, Technology and InnovationCLARIN in South Africa

Afrikaans Domain corpus POS annotated (5 domains)

Abstract

Description

This deliverable contains part-of-speech tagged data from five different text types for Afrikaans. The text types included are: - CAPS gr12 (Academic) - MA/PhD Theses (Academic) - Magazines (Non-Academic) - News (Non-Academic) - Novels (Fiction) The data is given as txt files where each line contains a token and the corresponding POS tag, tab separated. Each text type data file contains 11,000+ tokens, amounting to a total of 60,809 tokens for the language. Please see the included protocol for more details on the POS tags used.

Citation

License

Creative Commons Attribution 4.0 International

Verification status

Level 0