Afrikaans text unit identification data
Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/507'
dc.contact.email | martin.puttkammer@nwu.ac.za | en_ZA |
dc.contact.name | Martin Puttkammer | en_ZA |
dc.contributor.author | Puttkammer, Martin | |
dc.date.accessioned | 2019-04-15T14:05:43Z | |
dc.date.available | 2019-04-15T14:05:43Z | |
dc.date.issued | 2006 | |
dc.description | This dataset was developed during a masters degree and used in the development of a text unit identifier capable of tagging sentences, named-entities, words, abbreviations and punctuation in Afrikaans text. The dataset consists of 39,762 tokens, containing 3,294 named entities in 1,581 sentences. The data was manually annotated by the author and verified by an independent linguist according to the tagset developed during the same study. Details on the annotation and tagset used are available in the publication mentioned above in (2). The data is also presented in CoNNL-2002 format (Sang, E. F., & De Meulder, F. (2003). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. Available at: https://www.aclweb.org/anthology/W02-2024). | en_ZA |
dc.format | .txt | en_ZA |
dc.format.extent | 39,762 tokens | en_ZA |
dc.format.medium | N/A | en_ZA |
dc.format.size | 195,324 bytes (zipped) | en_ZA |
dc.identifier.citation | Puttkammer, M.J. 2006. Outomatiese Afrikaanse tekseenheididentifisering. Potchefstroom: North-West University. (Dissertation - MA). | en_ZA |
dc.identifier.uri | https://hdl.handle.net/20.500.12185/507 | |
dc.language.iso | afr | |
dc.languages | Afrikaans | en_ZA |
dc.media.category | Monolingual text corpus: annotated | en_ZA |
dc.media.type | Text | en_ZA |
dc.publisher | Centre for Text Technology, North-West University | en_ZA |
dc.rights.license | Creative Commons Attribution 4.0 International: https://creativecommons.org/licenses/by/4.0/ | en_ZA |
dc.subject | Afrikaans, Tokenisation, Sentence recognition, Named-entity recognition, sentence, named-entity, word, token | en_ZA |
dc.title | Afrikaans text unit identification data | en_ZA |
dc.version | 1.0 | en_ZA |
local.collection.primary | Resource Catalogue | |
local.collection.secondary | Resource Index | |
local.url | http://humanities.nwu.ac.za/ctext | en_ZA |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- TEI_v1.0.rar
- Size:
- 190.75 KB
- Format:
- A RAR file (short for a Roshal Archive Compressed file) is a compressed file, or data container, that holds one or more other files and folders inside of it. However, unlike a normal folder, a RAR file needs special software (more on this below) to open and "extract" out the contents.
- Description:
- All files zipped
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 3.23 KB
- Format:
- Item-specific license agreed upon to submission
- Description: