This is not the latest version of this item. The latest version can be found here.
Monolingual Siswati Corpus
Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/559'
dc.contact.email | tanja.gaustad@nwu.ac.za | en_ZA |
dc.contact.name | Tanja Gaustad | en_ZA |
dc.contributor.author | McKellar, Cindy | |
dc.date.accessioned | 2022-06-01T08:15:10Z | |
dc.date.available | 2022-06-01T08:15:10Z | |
dc.date.issued | 2022-03-31 | |
dc.description | Monolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The data comprises a total of 138, 651 segments with 1,536, 356 SiSwati words. | en_ZA |
dc.format | Text | en_ZA |
dc.format.extent | 138,651 segments with 1,536,356 Siswati words | en_ZA |
dc.format.size | 5.43 Mb | en_ZA |
dc.identifier.uri | https://hdl.handle.net/20.500.12185/559 | |
dc.languages | Siswati | en_ZA |
dc.media.category | Monolingual corpus | en_ZA |
dc.media.type | Text | en_ZA |
dc.project | SADiLaR: Parallel corpora for English into Siswati | en_ZA |
dc.publisher | North-West University - Centre for Text Technology (CTexT) | en_ZA |
dc.rights.license | Creative Commons Attribution 4.0 International | en_ZA |
dc.subject | SiSwati, monolingual | en_ZA |
dc.title | Monolingual Siswati Corpus | en_ZA |
dc.version | Version: 1.0 (Final) | en_ZA |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- lcontent.SADILAR.MonolingualCorpus(SS).1.0.1.CAM.2022-03-08.ss.zip
- Size:
- 5.43 MB
- Format:
- ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.
- Description:
- Monolingual corpus for SiSwati: Single UTF-8 text file
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 3.23 KB
- Format:
- Item-specific license agreed upon to submission
- Description: