Filter by:

Now showing items 135-154 of 349

Filter options

    • Lwazi Xitsonga ASR corpus 

      Charl van Heerden, et al. (Meraka Institute, CSIR, 2013-04-02) ~ Resource Catalogue
      Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
    • Lwazi Xitsonga Pronunciation Dictionary 

      Marelie Davel (Meraka Institute, CSIR, 2013-04-01) ~ Resource Catalogue
      General phonemic pronunciations for frequently occurring words in SA languages. Dictionaries were developed to be practically usable for speech technology ...
    • Lwazi Xitsonga TTS corpus 

      Daniel van Niekerk, et al. (Meraka Institute, CSIR, 2013-03-27) ~ Resource Catalogue
      Orthographic and phonemically aligned transcriptions
    • Monolingual isiXhosa corpus 

      McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
      Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data ...
    • Morphologically annotated corpus for isiNdebele 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ...
    • Morphologically annotated corpus for isiXhosa 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for isiZulu 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Sepedi 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Sepedi converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Sesotho 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Setswana 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Siswati 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Tshivenḓa 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Xitsonga 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Xitsonga converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Multilingual Linguistic Terminology 

      Griesel, Marissa (UNISA, 2022-09-20)
      Multilingual Linguistic Terminology Project Termbanks of Linguistic terminology for South African languages Version 1.0 https://linguistictermino ...
    • NCHLT Afrikaans Text Corpora 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
    • NCHLT Siswati Morphological Decomposer 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Morphological decomposer developed during the NCHLT Text project.
    • NCHLT Afrikaans Annotated Text Corpora 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
    • NCHLT Afrikaans Auxiliary Speech Corpus 

      Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
      The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
    • NCHLT Afrikaans fastText-CBoW embeddings 

      Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
      Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
    • NCHLT Afrikaans fastText-Skipgram embeddings 

      Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
      Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...