Department of Science, Technology and InnovationCLARIN in South Africa
 

Autshumato English-isiXhosa Parallel corpus

Loading...
Thumbnail Image

Date

2025-06-10

Authors

McKellar, Cindy

Journal Title

Journal ISSN

Volume Title

Publisher

North-West University - Centre for Text Technology (CTexT)

Abstract

Description

Aligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a newline. Dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project. NOTE: Version 2.0 has been processed in the same way as the other Autshumato resources. Content: 109,940 Segments; 1,745,236 English words; 1,264,390 isiXhosa words

Citation

Verification status

Level 0