Repository logoRepository logo
 

NCHLT Tagger

Loading...
Thumbnail Image

Date

2016-04-29

Authors

Roald Eiselen

Journal Title

Journal ISSN

Volume Title

Publisher

North-West University
Centre for Text Technology (CTexT)

Abstract

Description

A graphical user interface and command line tool to automatically annotate running text with one or more linguistic tags:
* Part of Speech
* Named entity type
* Phrase chunks

Available for the following languages:
Afrikaans
English
isiNdebele
isiXhosa
isiZulu
Sesotho sa Leboa (Sepedi)
Setswana
Sesotho (Southern Sotho)
Siswati
Tshivenda
Xitsonga

Available in the Readme.txt - Input format: Utf8 text file containing running text. Output file format: The output file is a tab-delimited text file containing each token followed by its the assigned class. Output classes for named entity recognition: B-/I-ORG Organisation, B-/I-PER Person, B-/I-LOC Location, B-/I-MISC Miscellaneous, OUT Outside

Keywords

Citation

Verification status

Level 0