Monolingual Siswati Corpus
License agreement
By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.
Download
MD5: a15d49f8523e887d4a82ea982d92a4dd
License agreement
By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.
Collections
- Resource Index [412]
Author(s)
McKellar, Cindy
Metadata
Show full item recordDescription
Monolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The data comprises a total of 138, 651 segments with 1,536, 356 SiSwati words.
Contact person
Tanja GaustadContact person's e-mail address
tanja.gaustad@nwu.ac.zaPublisher(s)
North-West University - Centre for Text Technology (CTexT)