Department of Science, Technology and InnovationCLARIN in South Africa
 

Autshumato Monolingual isiXhosa Monolingual corpus

Loading...
Thumbnail Image

Date

2025-06-10

Authors

McKellar, Cindy

Journal Title

Journal ISSN

Volume Title

Publisher

North-West University - Centre for Text Technology (CTexT)

Abstract

Description

Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project. NOTE: Version 2.0 has been processed in the same way as the other Autshumato resources. Content: 341,330 Segments; 4,328,245 XH Words

Citation

Verification status

Level 0