Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/559'
Monolingual Siswati Corpus
Loading...
Deposit Licenses
Date
2022-03-31
Authors
McKellar, Cindy
Journal Title
Journal ISSN
Volume Title
Publisher
North-West University - Centre for Text Technology (CTexT)
Abstract
Description
Monolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The data comprises a total of 138, 651 segments with 1,536, 356 SiSwati words.
Keywords
Citation
License
Creative Commons Attribution 4.0 International
Collections
Verification status
Level 0