NCHLT Setswana word2vec-CBOW embeddings
Please do not copy the URL from the browser for citation. The correct URL is 'https://hdl.handle.net/20.500.12185/651'
dc.contact.email | Roald.Eiselen@nwu.ac.za | |
dc.contact.name | Roald Eiselen | |
dc.contributor.author | Roald Eiselen | |
dc.contributor.other | Rico Koen | |
dc.contributor.other | Albertus Kruger | |
dc.contributor.other | Jacques van Heerden | |
dc.date.accessioned | 2023-07-28T08:12:10Z | |
dc.date.accessioned | 2023-05-01 | |
dc.date.available | 2023-07-28T08:12:10Z | |
dc.date.available | 2023-05-01 | |
dc.date.issued | 2023-05-01 | |
dc.description | Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Setswana text. | |
dc.format.extent | Training data: Paragraphs: 515,961; Token count: 14,518,437; Vocab size: 33,074; Embedding dimensions: 600; | |
dc.format.size | 71.93MB (Zipped) | |
dc.identifier.uri | https://hdl.handle.net/20.500.12185/651 | |
dc.language.iso | tn | |
dc.languages | Setswana | |
dc.media.category | Word embeddings | |
dc.media.type | Text | |
dc.project | NCHLT Text IV | |
dc.publisher | North-West University; Centre for Text Technology (CTexT) | |
dc.rights.license | Creative Commons Attribution 4.0 International (CC-BY 4.0) | |
dc.software.requirements | Python | |
dc.source | Web | |
dc.source | Government Documents | |
dc.title | NCHLT Setswana word2vec-CBOW embeddings | |
dc.type | Modules |
Files
Original bundle
1 - 1 of 1