Autshumato English-Setswana Parallel Corpora
Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/404'
dc.contact.email | sunny.gent@nwu.ac.za | |
dc.contact.name | Sunny Gent | |
dc.contributor.author | Cindy McKellar | |
dc.contributor.other | Roald Eiselen | |
dc.contributor.other | Wikus Pienaar | |
dc.database | Multilingual Text Corpora: Aligned | |
dc.date.accessioned | 2018-02-05T20:22:42Z | |
dc.date.accessioned | 2018-03-05T17:49:36Z | |
dc.date.available | 2018-02-05T20:22:42Z | |
dc.date.available | 2018-03-05T17:49:36Z | |
dc.date.issued | 2016-10-28 | |
dc.description | Aligned English-Setswana parallel corpus. This set contains data that was translated by professional translators, data that was sourced as translated file pairs from translators and data obtained from Government websites and documents. The data is given as six separate UTF-8 text files; with each aligned sentence pair on a new line. | |
dc.format.extent | 9.02 Mb (zipped) | |
dc.format.medium | Text | |
dc.format.medium | UTF8 | |
dc.format.size | 159 000 bilingual segments 2 037 173 English words (excluding punctuation and numbers). 2 596 023 Setswana words (excluding punctuation and numbers). | |
dc.identifier.islrn | 379-219-829-093-2 | |
dc.identifier.uri | https://hdl.handle.net/20.500.12185/404 | |
dc.language.iso | eng | |
dc.language.iso | tsn | |
dc.languages | English | |
dc.languages | Setswana | |
dc.media.category | Multilingual text corpora: Aligned | |
dc.media.type | Text | |
dc.project | Autshumato | |
dc.publisher | North-West University | |
dc.publisher | Centre for Text Technology (CTexT) | |
dc.rights.license | Creative Commons Attribution 2.5 South Africa License: http://creativecommons.org/licenses/by/2.5/za/legalcode | |
dc.source | Based on documents from the South African government domain crawled from gov.za websites and collected from various language units. | |
dc.stratum | Details provided in documentation. | |
dc.title | Autshumato English-Setswana Parallel Corpora | |
dc.type | Data | |
dc.version | 1 | |
local.collection.primary | Resource Catalogue | |
local.collection.secondary | Resource Index |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- autshumato_english-setswana_parallel_corpora.zip
- Size:
- 9.02 MB
- Format:
- ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.