Creative Commons Attribution 2.5 South Africa License: http://creativecommons.org/licenses/by/2.5/za/legalcodeCindy McKellarRoald EiselenWikus Pienaar2018-02-052018-03-052018-02-052018-03-052016-10-28https://hdl.handle.net/20.500.12185/404Aligned English-Setswana parallel corpus. This set contains data that was translated by professional translators, data that was sourced as translated file pairs from translators and data obtained from Government websites and documents. The data is given as six separate UTF-8 text files; with each aligned sentence pair on a new line.9.02 Mb (zipped)TextUTF8engAutshumato English-Setswana Parallel CorporaData379-219-829-093-2159 000 bilingual segments 2 037 173 English words (excluding punctuation and numbers). 2 596 023 Setswana words (excluding punctuation and numbers).