Now showing items 1-10 of 350

    • Autshumato Monolingual English Corpus 

      McKeller, Cindy (CTexT® (Centre for Text Technology, North-West University), 2023-10-30)
      Monolingual corpus for South African English. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically ...
    • African Wordnet version 1.0 

      Griesel, Marissa (UNISA, 2022-09-20)
      Developed using the expand model with Princeton WordNet 3.1 as basis. Please see https://africanwordnet.wordpress.com/ for all details on the project. ...
    • Ex Machina: Using NLP and statistical learning models to model eyewitness statements and choosing behaviour 

      Nortje, Alicia, et al. (Sadilar, 2019-05-07)
      This curated database includes data from various of empirical studies where eyewitness statements and descriptions were collected. The original studies, ...
    • Autshumato English-Tshivenḓa Parallel Corpora 

      McKellar, Cindy (North-West University; Centre for Text Technology (CTexT), 2023-12-12)
      Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced ...
    • Autshumato Monolingual Tshivenḓa Corpus 

      McKellar, Cindy (North-West University; Centre for Text Technology (CTexT), 2023-12-12)
      Monolingual corpus for Tshivenḓa. The data is given as a single UTF-8 text file, with each segment on a newline.
    • Morphologically annotated corpus for isiNdebele 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ...
    • Morphologically annotated corpus for isiXhosa 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for isiZulu 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Siswati 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Sesotho 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...