Now showing items 11-20 of 527

    • Morphologically annotated corpus for Setswana 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Tshivenḓa 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Xitsonga 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Xitsonga converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • POS annotated corpus with 5 different text types for isiZulu 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      This is a POS annotated corpus with 5 different text types for isiZulu. The text types included are: - CAPS gr12 (Academic) - https://www.educat ...
    • POS annotated corpus in 5 different genres for Sepedi 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      This corpus contains POS annotated data in 5 different genres for Sepedi. The text types included are: - CAPS gr12 (Academic) - https://www.educ ...
    • Multilingual Linguistic Terminology 

      Griesel, Marissa (UNISA, 2022-09-20)
      Multilingual Linguistic Terminology Project Termbanks of Linguistic terminology for South African languages Version 1.0 https://linguistictermino ...
    • Afrikaans lexical blends dataset 

      Trollip, Benito, et al. (North-West University, 2023-12)
      This a dataset of Afrikaans blend constructions that have been collected and analysed using the Levenshtein distance metric. This dataset serves as the ...
    • USAf National Language Resources Audit 2023 

      Van Dyk, T.J., et al. (South African Centre for Digital Language Resources, 2023-10)
      This report documents the findings of a comprehensive language resources audit conducted by the South African Centre for Digital Language Resources ...
    • Generic Multilingual Academic Wordlists with Definitions 

      Van Dyk, Tobie (SADiLaR; ICELDA, 2022)
      This multilingual generic academic wordlist has been developed to serve as a resource to students to assist with building a vocabulary and decoding ...
    • NCHLT isiZulu word2vec-Skipgram embeddings 

      Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
      Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...