Repository logoRepository logo
 

Monolingual Siswati Corpus

dc.contact.emailtanja.gaustad@nwu.ac.zaen_ZA
dc.contact.nameTanja Gaustaden_ZA
dc.contributor.authorMcKellar, Cindy
dc.date.accessioned2022-06-01T08:15:10Z
dc.date.available2022-06-01T08:15:10Z
dc.date.issued2022-03-31
dc.descriptionMonolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The data comprises a total of 138, 651 segments with 1,536, 356 SiSwati words.en_ZA
dc.formatTexten_ZA
dc.format.extent138,651 segments with 1,536,356 Siswati wordsen_ZA
dc.format.size5.43 Mben_ZA
dc.identifier.urihttps://hdl.handle.net/20.500.12185/559
dc.languagesSiswatien_ZA
dc.media.categoryMonolingual corpusen_ZA
dc.media.typeTexten_ZA
dc.projectSADiLaR: Parallel corpora for English into Siswatien_ZA
dc.publisherNorth-West University - Centre for Text Technology (CTexT)en_ZA
dc.rights.licenseCreative Commons Attribution 4.0 Internationalen_ZA
dc.subjectSiSwati, monolingualen_ZA
dc.titleMonolingual Siswati Corpusen_ZA
dc.versionVersion: 1.0 (Final)en_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
lcontent.SADILAR.MonolingualCorpus(SS).1.0.1.CAM.2022-03-08.ss.zip
Size:
5.43 MB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.
Description:
Monolingual corpus for SiSwati: Single UTF-8 text file

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.23 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections