Repository logoRepository logo
 

Core technologies for conjunctively written South African languages

dc.contact.emailMartin.Puttkammer@nwu.ac.zaen_ZA
dc.contact.nameMartin Puttkammeren_ZA
dc.contributor.authorDu Toit, Jaco
dc.contributor.authorPuttkammer, Martin
dc.contributor.otherGent, Sunny
dc.contributor.otherGaustad, Tanja
dc.date.accessioned2021-12-06T10:57:03Z
dc.date.available2021-12-06T10:57:03Z
dc.date.issued2021-03-31
dc.descriptionDuring this SADiLaR funded project, enriched corpora for the four official South African languages with a conjunctive orthography, i.e. isiNdebele (NR), isiXhosa (XH), isiZulu (ZU), and Siswati (SS) was developed. The corpora consist of approximately 50,000 tokens, parallel on sentence level, with English as source language, for each language. Each language’s corpus was annotated on three levels, namely morphological analysis, part of speech and lemmatisation (see: https://repo.sadilar.org/handle/20.500.12185/546). Using the annotated data, 12 core technologies, i.e. morphological analysers, POS taggers and lemmatisers for each of the four languages were developed and packaged in a single graphical user interface (UI).en_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/548
dc.languagesisiNdebeleen_ZA
dc.languagesisiXhosaen_ZA
dc.languagesisiZuluen_ZA
dc.languagesSiswatien_ZA
dc.projectLinguistic corpus enrichment for conjunctively written South African languages
dc.publisherNorth-West University, Centre for Language Technology (CTexT)en_ZA
dc.subjectpart of speechen_ZA
dc.subjectpart of speech taggingen_ZA
dc.subjectpart-of-speech taggingen_ZA
dc.subjectpart-of-speechen_ZA
dc.subjectlemmaen_ZA
dc.subjectlemmatisationen_ZA
dc.subjectlemmatizationen_ZA
dc.subjectmorphologyen_ZA
dc.subjectmorphological analysisen_ZA
dc.subjectconjunctive languagesen_ZA
dc.subjectNguni languagesen_ZA
dc.titleCore technologies for conjunctively written South African languagesen_ZA
dc.typeModules

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CoreTechInterface.zip
Size:
375.51 MB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.23 KB
Format:
Item-specific license agreed upon to submission
Description: