Now showing items 101-110 of 529

    • NCHLT Afrikaans fastText-CBoW embeddings 

      Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
      Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
    • South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) version 2023-03 

      Van Dyk, Tobie (ICELDA; SADiLaR, 2023-03)
      The South African Multilingual Learner Corpus of Academic Texts (SAMuLCAT) is a multi-genre, multi-level learner corpus developed by the Inter-institutional ...
    • Autshumato Monolingual Setswana Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato Monolingual Sesotho Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato Monolingual Sepedi Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for Sepedi. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato Monolingual isiZulu Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for isiZulu. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato Monolingual Afrikaans Corpus 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
    • Autshumato English-Xitsonga Parallel Corpora 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Aligned parallel corpora for the language pair English-Xitsonga. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
    • Autshumato English-Setswana Parallel Corpora 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Aligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
    • Autshumato English-Sesotho Parallel Corpora 

      McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
      Aligned parallel corpora for the language pair English-Sesotho. The data is given as two separate UTF-8 text files, with each aligned segment on a ...