Repository logoRepository logo
 

Bilingual English-isiXhosa corpus

dc.contact.emailtanja.gaustad@nwu.ac.zaen_ZA
dc.contact.nameTanja Gaustaden_ZA
dc.contributor.authorMcKellar, Cindy
dc.date.accessioned2020-01-14T07:24:24Z
dc.date.available2020-01-14T07:24:24Z
dc.date.issued2019-11-30
dc.descriptionAligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a newline. Dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project. The data comprises a total of 126,708 segments with 1,948,207 English words and 1,378,285 isiXhosa words. The first 22,785 lines of the data files contain the original Autshumato project data and the remaining part of the files contain the new data.en_ZA
dc.format.txten_ZA
dc.format.extent126,708 Segments; 1,948,207 English words; 1,378,285 isiXhosa wordsen_ZA
dc.format.mediumN/Aen_ZA
dc.format.sizeEn 11,4 Mb, Xh 12,2 Mben_ZA
dc.identifier.citationNoneen_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/525
dc.language.isoeng
dc.language.isoxho
dc.languagesEnglishen_ZA
dc.languagesisiXhosaen_ZA
dc.languages.otherN/Aen_ZA
dc.media.categoryMultilingual text corpora: Aligneden_ZA
dc.media.typeTexten_ZA
dc.projectParallel corpora for English into isiXhosaen_ZA
dc.publisherNorth-West University - Centre for Text Technology (CTexT)en_ZA
dc.rights.licenseCreative Commons Attribution 4.0 International: http://creativecommons.org/licenses/by/4.0/en_ZA
dc.subjectAligned data, parallel corpus, English, isiXhosaen_ZA
dc.titleBilingual English-isiXhosa corpusen_ZA
dc.version1.0 (Final)en_ZA
local.collection.primaryResource Catalogue
local.collection.secondaryResource Index
local.urlN/Aen_ZA

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Corpus.SADiLaR.English-isiXhosaDrop-Bilingual.1.0.0.CAM.2019-11-15.en.txt
Size:
11.43 MB
Format:
Plain Text
Description:
Aligned parallel corpora for the following language pair: English-isiXhosa.
Loading...
Thumbnail Image
Name:
Corpus.SADiLaR.English-isiXhosaDrop-Bilingual.1.0.0.CAM.2019-11-15.xh.txt
Size:
12.23 MB
Format:
Plain Text
Description:
Aligned parallel corpora for the following language pair: English-isiXhosa.

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.23 KB
Format:
Item-specific license agreed upon to submission
Description: