Monolingual Siswati Corpus
- Resource Index 
MetadataShow full item record
Monolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The data comprises a total of 138, 651 segments with 1,536, 356 SiSwati words.
Contact personTanja Gaustad
Contact person's e-mail email@example.com
North-West University - Centre for Text Technology (CTexT)