Autshumato English-isiZulu Parallel Corpora
Aligned parallel corpora for the language pair English-isiZulu. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for.
Sunny Gent
CTexT® (Centre for Text Technology, North-West University)
Creative Commons Attribution 4.0 International
English; isiZulu
McKellar, Cindy
Gaustad Van Zaanen, Tanja; Puttkammer, Martin; Gent, Sunny; van Heerden, Jacques
Autshumato; English; isiZulu
Multilingual text corpora: Aligned
Aligned Segments: 233 691 English Words: 4 148 245 isiZulu Words: 2 910 800
2.0 (Final)
19.9 Mb (zipped)
Text; UTF8
Autshumato VI

