Repository logoRepository logo
 

Monolingual isiXhosa corpus

dc.contact.emailtanja.gaustad@nwu.ac.zaen_ZA
dc.contact.nameTanja Gaustaden_ZA
dc.contributor.authorMcKellar, Cindy
dc.date.accessioned2020-01-14T07:24:18Z
dc.date.available2020-01-14T07:24:18Z
dc.date.issued2019-11-30
dc.descriptionMonolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project. The data comprises a total of 233,192 segments with 2,424,706 isiXhosa words. The first 48,213 lines of the data file contain the original Autshumato project data and the remaining part of the file contains the new data.en_ZA
dc.format.txten_ZA
dc.format.extent233,192 Segments; 2,424,706 isiXhosa wordsen_ZA
dc.format.mediumN/Aen_ZA
dc.format.size21,7 Mben_ZA
dc.identifier.citationNoneen_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/524
dc.language.isoxho
dc.languagesisiXhosaen_ZA
dc.media.categoryMonolingual text corporaen_ZA
dc.media.typeTexten_ZA
dc.projectParallel corpora for English into isiXhosaen_ZA
dc.publisherNorth-West University - Centre for Text Technology (CTexT)en_ZA
dc.rights.licenseCreative Commons Attribution 4.0 International: http://creativecommons.org/licenses/by/4.0/en_ZA
dc.subjectMonolingual corpus; isiXhosaen_ZA
dc.titleMonolingual isiXhosa corpusen_ZA
dc.version1.0 (Final)en_ZA
local.collection.primaryResource Catalogue
local.collection.secondaryResource Index
local.urlN/Aen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Corpus.SADiLaR.English-isiXhosaDrop-Monolingual.1.0.0.CAM.2019-11-15.xh.txt
Size:
21.74 MB
Format:
Plain Text
Description:
Monolingual corpus for isiXhosa.

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.23 KB
Format:
Item-specific license agreed upon to submission
Description: