Repository logoRepository logo
 

Autshumato English-Tshivenḓa Parallel Corpora

dc.contact.emailsunny.gent@nwu.ac.zaen_ZA
dc.contact.nameSunny Genten_ZA
dc.contributor.authorMcKellar, Cindy
dc.contributor.otherPuttkammer, Martin
dc.contributor.otherGaustad, Tanja
dc.contributor.otherGent, Sunny
dc.contributor.othervan Heerden, Jacques
dc.date.accessioned2024-03-27T08:27:23Z
dc.date.available2024-03-27T08:27:23Z
dc.date.issued2023-12-12
dc.descriptionAligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced from translated material and created by translating English sentences into Tshivenḓa. The data is given as two separate UTF-8 text files, with each aligned segment on a newline.en_ZA
dc.formatTxten_ZA
dc.format.extentThere are 110,367 English-Tshivenḓa segments, consisting of 2,000,657 English words and 2,527,789 Tshivenḓa words.en_ZA
dc.format.size9.74Mben_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/682
dc.languagesEnglishen_ZA
dc.languagesTshivendaen_ZA
dc.media.categoryMultilingual text corpora: Aligneden_ZA
dc.media.typeTexten_ZA
dc.projectAutshumatoen_ZA
dc.publisherNorth-West University; Centre for Text Technology (CTexT)en_ZA
dc.rights.licenseCreative Commons Attribution 4.0 International: https://creativecommons.org/licenses/by/4.0/deed.enen_ZA
dc.subjectAutshumato Ven_ZA
dc.subjectAligned parallel corporaen_ZA
dc.subjectTshivenḓaen_ZA
dc.titleAutshumato English-Tshivenḓa Parallel Corporaen_ZA
dc.version3.0 (Final)en_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
lcontent.SADILAR.BilingualCorpus(EN-VE).3.0.0.CAM.2023-12-12.en.zip
Size:
9.74 MB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.
Description:
Bilingual Corpus (EN-VE)

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.22 KB
Format:
Item-specific license agreed upon to submission
Description: