In 2009, the South African National HLT Network (NHN) funded a technology audit that was conducted to form a clear profile of the research and development activities in the human language technology field in South Africa. This audit was used as the basis for the RMA Index, which is a list of South African resources with the relevant metadata (information such as developer details and specifications). Some of these resources are included in the RMA Catalogue, and are therefore available for download.

Collections in this community

  • Resource Catalogue [333]

    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
  • Resource Index [411]

    A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.
  • Student Data Repository [6]

    A collection of language resources available as part of the output of post-graduate study programs

Recent Submissions

  • Afrikaans lexical blends dataset 

    Trollip, Benito, et al. (North-West University, 2023-12)
    This a dataset of Afrikaans blend constructions that have been collected and analysed using the Levenshtein distance metric. This dataset serves as the ...
  • USAf National Language Resources Audit 2023 

    Van Dyk, T.J., et al. (South African Centre for Digital Language Resources, 2023-10)
    This report documents the findings of a comprehensive language resources audit conducted by the South African Centre for Digital Language Resources ...
  • Generic Multilingual Academic Wordlists with Definitions 

    Van Dyk, Tobie (SADiLaR; ICELDA, 2022)
    This multilingual generic academic wordlist has been developed to serve as a resource to students to assist with building a vocabulary and decoding ...
  • NCHLT isiZulu word2vec-Skipgram embeddings 

    Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
  • NCHLT isiXhosa word2vec-Skipgram embeddings 

    Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
  • NCHLT Tshivenḓa word2vec-Skipgram embeddings 

    Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
  • NCHLT Xitsonga word2vec-Skipgram embeddings 

    Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
  • NCHLT Setswana word2vec-Skipgram embeddings 

    Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
  • NCHLT Sesotho word2vec-Skipgram embeddings 

    Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
  • NCHLT Siswati word2vec-Skipgram embeddings 

    Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...

View more