License

These files are distributed under the Creative Commons Attribution 2.5 South Africa license. 

All files except CRFSharp.dll, CRFShardWrapper.dll and AdvUtils.dll are distributed under the same conditions.
_______________________________________________
License: Creative Commons Attribution 2.5 South Africa
URL: http://creativecommons.org/licenses/by/2.5/za/

Attribute work to: South African Department of Arts and Culture & Centre for Text Technology (CTexT, North-West University, South Africa)

Attribute work to URL: http://www.nwu.ac.za/ctext 
______________________________________________

CRFSharp.dll, CRFShardWrapper.dll and AdvUtils.dll are distributed under the New BSD License
Copyright (c) 2011, Zhongkai Fu
All rights reserved.
______________________________________________
Requirements:
Microsoft .Net 4.5 (http://www.microsoft.com/en-za/download/details.aspx?id=30653)

Additional resources:
"Taggers" directory with the NCHLT Phase I POS tagger.
"NER-Models" directory with NER models for ten South African languages.
"PC-Models" directory with NER models for ten South African languages.
All files are required, and removing any of the files may cause one or more of the components to behave inconsistently.
______________________________________________

Input format: Utf8 text file containing running text. 
______________________________________________

The executable NCHLT-Taggers.exe can either be run as a GUI application, or from the command line.
When no arguments are specified, the GUI wil be launched.
To get the usage information for the command line tool run the following command:
	NCHLT-Taggers.exe -?
______________________________________________

To run the command line version of the tools:
1. Open a command prompt
2. In the prompt, navigate to the directory containing NCHLT-Tagger.exe
3. The following usage information provides the command line options:
	Usage: NCHLT-Taggers.exe -i <InputTextFiles> -l <Language> -c <CoreTech>
        e.g. NCHLT-Taggers.exe -i isiZulu.Input.txt -l ZU -c PC
        -i|input        Input text file(s) to process for the particular language.To run multiple files, separate file paths with a ';'
        -l|language     Processing language. Must be one of the following:
                <AF|NSO|NR|SS|ST|TN|TS|VE|XH|ZU>
        -c|coretech     Core technology to run. Must be one of the following:
                <token|sentence|pos|ner|pc>
4. The output file is automatically created by adding the name of the core technology to the beginning of the file name processed.
______________________________________________

Output file format: The output file is a tab-delimited text file containing each token followed by its the assigned class.
Output classes for named entity recognition:
	B-/I-ORG	Organisation
	B-/I-PER	Person
	B-/I-LOC	Location
	B-/I-MISC	Miscellaneous
	OUT			Outside
	
Output classes for phrase chunking:
	B-/I-NOUN	Noun phrase
	B-/I-VERB	Verb phrase
	B-/I-PREP	Prepositional phrase
	B-/I-ADJ	Adjective phrase
	B-/I-ADV	Adverbial phrase
	OUT			Outside

Please see the associated protocols for more information on the annotation scheme.
