Creative Commons Attribution 4.0 InternationalMcKellar, Cindy2022-06-012022-06-012022-03-31https://hdl.handle.net/20.500.12185/560Aligned parallel corpora for the following language pair: English-SiSwati. The data is given as four separate UTF-8 text files, with each segment on a newline. Dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The dataset contains the following types of bilingual data: Translations from English to Siswati and crawled parallel data for English-Siswati. The dataset comprises a total of 114,839 segments with 2,002,293 English words and 1, 423,414 SiSwati words.Text114,839 segments with 2,002,293 English words and 1, 423,414 Siswati wordsN/ASiswati, aligned data, multilingual, translations, crawled, machine translation training dataBilingual English-Siswati Corpus9.54 Mb