Department of Science, Technology and InnovationCLARIN in South Africa
 

Autshumato Monolingual isiXhosa Monolingual corpus

dc.contact.emailtanja.gaustad@nwu.ac.za
dc.contact.nameTanja Gaustad
dc.contributor.authorMcKellar, Cindy
dc.date.accessioned2025-07-31T08:52:59Z
dc.date.available2025-07-31T08:52:59Z
dc.date.issued2025-06-10
dc.descriptionMonolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project. NOTE: Version 2.0 has been processed in the same way as the other Autshumato resources. Content: 341,330 Segments; 4,328,245 XH Words
dc.formattext
dc.format.extent341,330 Segments; 4,328,245 XH Words
dc.format.mediumN/A
dc.format.size39 MB
dc.identifier.urihttps://hdl.handle.net/20.500.12185/692
dc.languagesisiXhosa
dc.media.categoryMonolingual text corpora
dc.media.typeText
dc.projectParallel corpora for English into isiXhosa
dc.publisherNorth-West University - Centre for Text Technology (CTexT)
dc.rights.licenseCreative Commons Attribution 4.0 International: http://creativecommons.org/licenses/by/4.0/
dc.subjectmonolingual corpora, isiXhosa, Machine translation
dc.titleAutshumato Monolingual isiXhosa Monolingual corpus
dc.version2.0

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
MonolingualCorpus.SADiLaR.isiXhosa.2.0.1.CAM.2025-06-10.xh.txt
Size:
38.81 MB
Format:
Plain Text
Loading...
Thumbnail Image
Name:
README_monolingual.txt
Size:
1.46 KB
Format:
Plain Text

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.22 KB
Format:
Item-specific license agreed upon to submission
Description: