Show simple item record

NCHLT Sesotho word2vec-CBOW embeddings
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Sesotho text.
Roald Eiselen
Roald.Eiselen@nwu.ac.za
North-West University; Centre for Text Technology (CTexT)
Creative Commons Attribution 4.0 International (CC-BY 4.0)
Sesotho
Roald Eiselen
Rico Koen; Albertus Kruger; Jacques van Heerden
https://hdl.handle.net/20.500.12185/650
Text
Modules
Word embeddings
Training data: Paragraphs: 535,853; Token count: 17,425,650; Vocab size: 34,888; Embedding dimensions: 600;
75.87MB (Zipped)
NCHLT Text IV
Python
Web; Government Documents
st
2023-07-28T08:12:08Z; 2023-05-01
2023-07-28T08:12:08Z; 2023-05-01
2023-05-01


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Resource Catalogue [350]
    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.

Show simple item record