Department of Science, Technology and InnovationCLARIN in South Africa
 

Autshumato English-isiXhosa Parallel corpus

dc.contact.emailtanja.gaustad@nwu.ac.za
dc.contact.nameTanja Gaustad
dc.contributor.authorMcKellar, Cindy
dc.date.accessioned2025-07-29T12:41:49Z
dc.date.available2025-07-29T12:41:49Z
dc.date.issued2025-06-10
dc.descriptionAligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a newline. Dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project. NOTE: Version 2.0 has been processed in the same way as the other Autshumato resources. Content: 109,940 Segments; 1,745,236 English words; 1,264,390 isiXhosa words
dc.formattext
dc.format.extent109,940 Segments; 1,745,236 English words; 1,264,390 isiXhosa words
dc.format.mediumN/A
dc.format.size21MB
dc.identifier.urihttps://hdl.handle.net/20.500.12185/691
dc.languagesEnglish
dc.languagesisiXhosa
dc.media.categoryParallel text corpora
dc.media.typeText
dc.projectParallel corpora for English into isiXhosa
dc.publisherNorth-West University - Centre for Text Technology (CTexT)
dc.rights.licenseCreative Commons Attribution 4.0 International: http://creativecommons.org/licenses/by/4.0/
dc.subjectparallel corpora, isiXhosa, English, machine translation
dc.titleAutshumato English-isiXhosa Parallel corpus
dc.version2.0

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
BilingualCorpus.SADiLaR.English-isiXhosa.2.0.0.CAM.2025-06-10.en.txt
Size:
10.27 MB
Format:
Plain Text
Loading...
Thumbnail Image
Name:
BilingualCorpus.SADiLaR.English-isiXhosa.2.0.0.CAM.2025-06-10.xh.txt
Size:
11.26 MB
Format:
Plain Text
Loading...
Thumbnail Image
Name:
README_bilingual.txt
Size:
1.61 KB
Format:
Plain Text

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.22 KB
Format:
Item-specific license agreed upon to submission
Description: