Core technologies for conjunctively written South African languages
Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/548'
dc.contact.email | Martin.Puttkammer@nwu.ac.za | en_ZA |
dc.contact.name | Martin Puttkammer | en_ZA |
dc.contributor.author | Du Toit, Jaco | |
dc.contributor.author | Puttkammer, Martin | |
dc.contributor.other | Gent, Sunny | |
dc.contributor.other | Gaustad, Tanja | |
dc.date.accessioned | 2021-12-06T10:57:03Z | |
dc.date.available | 2021-12-06T10:57:03Z | |
dc.date.issued | 2021-03-31 | |
dc.description | During this SADiLaR funded project, enriched corpora for the four official South African languages with a conjunctive orthography, i.e. isiNdebele (NR), isiXhosa (XH), isiZulu (ZU), and Siswati (SS) was developed. The corpora consist of approximately 50,000 tokens, parallel on sentence level, with English as source language, for each language. Each language’s corpus was annotated on three levels, namely morphological analysis, part of speech and lemmatisation (see: https://repo.sadilar.org/handle/20.500.12185/546). Using the annotated data, 12 core technologies, i.e. morphological analysers, POS taggers and lemmatisers for each of the four languages were developed and packaged in a single graphical user interface (UI). | en_ZA |
dc.identifier.uri | https://hdl.handle.net/20.500.12185/548 | |
dc.languages | isiNdebele | en_ZA |
dc.languages | isiXhosa | en_ZA |
dc.languages | isiZulu | en_ZA |
dc.languages | Siswati | en_ZA |
dc.project | Linguistic corpus enrichment for conjunctively written South African languages | |
dc.publisher | North-West University, Centre for Language Technology (CTexT) | en_ZA |
dc.subject | part of speech | en_ZA |
dc.subject | part of speech tagging | en_ZA |
dc.subject | part-of-speech tagging | en_ZA |
dc.subject | part-of-speech | en_ZA |
dc.subject | lemma | en_ZA |
dc.subject | lemmatisation | en_ZA |
dc.subject | lemmatization | en_ZA |
dc.subject | morphology | en_ZA |
dc.subject | morphological analysis | en_ZA |
dc.subject | conjunctive languages | en_ZA |
dc.subject | Nguni languages | en_ZA |
dc.title | Core technologies for conjunctively written South African languages | en_ZA |
dc.type | Modules |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- CoreTechInterface.zip
- Size:
- 375.51 MB
- Format:
- ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.
- Description:
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 3.23 KB
- Format:
- Item-specific license agreed upon to submission
- Description: