Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/691'
Autshumato English-isiXhosa Parallel corpus
Loading...
Date
2025-06-10
Authors
McKellar, Cindy
Journal Title
Journal ISSN
Volume Title
Publisher
North-West University - Centre for Text Technology (CTexT)
Abstract
Description
Aligned parallel corpora for the following language pair: English-isiXhosa.
The data is given as two separate UTF-8 text files, with each segment on a newline.
Dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project.
NOTE: Version 2.0 has been processed in the same way as the other Autshumato resources.
Content: 109,940 Segments; 1,745,236 English words; 1,264,390 isiXhosa words
Citation
Collections
Verification status
Level 0