Creative Commons Attribution 4.0 International: http://creativecommons.org/licenses/by/4.0/McKellar, Cindy2025-07-312025-07-312025-06-10https://hdl.handle.net/20.500.12185/692Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into isiXhosa project. NOTE: Version 2.0 has been processed in the same way as the other Autshumato resources. Content: 341,330 Segments; 4,328,245 XH Wordstext341,330 Segments; 4,328,245 XH WordsN/Amonolingual corpora, isiXhosa, Machine translationAutshumato Monolingual isiXhosa Monolingual corpus39 MB