Show simple item record

CKarma
CKarma is a compound analyser for Afrikaans, to be used for the detection of word boundaries within compounds. It takes as input a string, and produces as output an analysed string, without any tags. For example, the string "hondehokdak" ('dog house roof') will be analysed as "hond _ e + hok + dak", where the plus sign indicates the beginning of an independent constituent, and the underscore the beginning of a dependent constituent (i.e. a valence morpheme). CKarma is a C5 classifier, trained on data consisting of circa 47,000 compound and 7,000 non-compounds. The resulting decision tree and cases can be converted to C code by means of a script written by MM van Zaanen. This C code can then be implemented in any other system.
Martin Puttkammer
Martin.Puttkammer@nwu.ac.za
North-West University; Centre for Text Technology (CTexT)
Afrikaans
https://hdl.handle.net/20.500.12185/145
Text
Modules
Compound Analyser
N/A
Resource Index
afr
2018-02-05T07:33:07Z; 2018-03-05T14:58:06Z
2018-02-05T07:33:07Z; 2018-03-05T14:58:06Z
2015-01-30


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • Resource Index [412]
    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record