Repository logoRepository logo
 

Autshumato English-Setswana Parallel Corpora

dc.contact.emailsunny.gent@nwu.ac.zaen_ZA
dc.contact.nameSunny Genten_ZA
dc.contributor.authorMcKellar, Cindy
dc.contributor.otherGaustad Van Zaanen, Tanja
dc.contributor.otherPuttkammer, Martin
dc.contributor.otherGent, Sunny
dc.contributor.othervan Heerden, Jacques
dc.date.accessioned2022-12-15T06:35:12Z
dc.date.available2022-12-15T06:35:12Z
dc.date.issued2022-09-30
dc.descriptionAligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for.en_ZA
dc.formatTxten_ZA
dc.format.extentAligned Segments: 238 475 English Words: 3 583 483 Setswana Words: 4 874 105en_ZA
dc.format.mediumText; UTF8en_ZA
dc.format.size18.0 Mb (zipped)en_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/578
dc.languagesEnglishen_ZA
dc.languagesSepedien_ZA
dc.media.categoryMultilingual text corpora: Aligneden_ZA
dc.media.typeTexten_ZA
dc.projectAutshumato VIen_ZA
dc.publisherCTexT® (Centre for Text Technology, North-West University)en_ZA
dc.rights.licenseCreative Commons Attribution 4.0 Internationalen_ZA
dc.subjectAutshumatoen_ZA
dc.subjectEnglishen_ZA
dc.subjectSetswanaen_ZA
dc.titleAutshumato English-Setswana Parallel Corporaen_ZA
dc.version2.0 (Final)en_ZA
local.urlhttp://autshumato.sourceforge.net/en_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Autshumato.BilingualCorpus(English-Setswana).v2.0.zip
Size:
18.06 MB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.
Description:
Autshumato.BilingualCorpus(English-Setswana).v2.0

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.23 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections