Department of Science, Technology and InnovationCLARIN in South Africa

Autshumato English-Siswati Parallel Corpora

Loading...
Thumbnail Image

Date

2022-03-31

Authors

McKellar, Cindy

Journal Title

Journal ISSN

Volume Title

Publisher

North-West University - Centre for Text Technology (CTexT)

Abstract

Description

Aligned parallel corpora for the following language pair: English-SiSwati. The data is given as four separate UTF-8 text files, with each segment on a newline. Dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The dataset contains the following types of bilingual data: Translations from English to Siswati and crawled parallel data for English-Siswati. The dataset comprises a total of 114,839 segments with 2,002,293 English words and 1, 423,414 SiSwati words. (A new version issued since the title was changed)

Citation

License

Creative Commons Attribution 4.0 International

Collections

Verification status

Level 0

Version History

Now showing 1 - 2 of 2
VersionDateSummary
2*
2025-09-16 08:29:37
A new version issued since the title was changed
2022-06-01 10:24:01
* Selected version