Creative Commons Attribution 4.0 International: http://creativecommons.org/licenses/by/4.0/McKellar, Cindy2025-07-292025-07-292025-06-10https://hdl.handle.net/20.500.12185/691Aligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a newline. Dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project. NOTE: Version 2.0 has been processed in the same way as the other Autshumato resources. Content: 109,940 Segments; 1,745,236 English words; 1,264,390 isiXhosa wordstext109,940 Segments; 1,745,236 English words; 1,264,390 isiXhosa wordsN/Aparallel corpora, isiXhosa, English, machine translationAutshumato English-isiXhosa Parallel corpus21MB