POS-Taggers for SA Languages (Linux)

1. Install hunpos (http://code.google.com/p/hunpos/)
2. [TokensToTag].txt - Input format: UTF8 text data, one token per line. Sentences to be delimited by single empty lines.
3. [LanguageModel].model - Trained model for POS-tagging (language specific). Models supplied under ./models.
4. Tagger usage (command line): "cat [TokensToTag].txt | ./hunpos-tag [LanguageModel].model > [TaggedOutputFile].txt"
5. [TaggedOutputFile].txt - POS-tagged text output. Structure: "Token tab Tag".

See hunpos documentation for more information regarding formats and training of new models.

License

These files are distributed under the Creative Commons Attribution 2.5 South Africa license. 

All files are distributed under the same conditions.
_______________________________________________
License: Creative Commons Attribution 2.5 South Africa
URL: http://creativecommons.org/licenses/by/2.5/za/

Attribute work to: South African Department of Arts and Culture & Centre for Text Technology (CTexT, North-West University, South Africa)

Attribute work to URL: http://www.nwu.ac.za/ctext 
______________________________________________