Now showing items 1-10 of 529

    • Autshumato Monolingual English Corpus 

      McKeller, Cindy (CTexT® (Centre for Text Technology, North-West University), 2023-10-30)
      Monolingual corpus for South African English. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically ...
    • IsiZulu Second Language Learner Speech Corpus 

      O'Neil, Alexandra, et al. (Indiana University, 2024)
      This corpus is specifically designed to assist in evaluating the performance of pronunciation feedback tools for second language learning. The corpus ...
    • African Wordnet version 1.0 

      Griesel, Marissa (UNISA, 2022-09-20)
      Developed using the expand model with Princeton WordNet 3.1 as basis. Please see https://africanwordnet.wordpress.com/ for all details on the project. ...
    • Ex Machina: Using NLP and statistical learning models to model eyewitness statements and choosing behaviour 

      Nortje, Alicia, et al. (Sadilar, 2019-05-07)
      This curated database includes data from various of empirical studies where eyewitness statements and descriptions were collected. The original studies, ...
    • Autshumato English-Tshivenḓa Parallel Corpora 

      McKellar, Cindy (North-West University; Centre for Text Technology (CTexT), 2023-12-12)
      Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced ...
    • Autshumato Monolingual Tshivenḓa Corpus 

      McKellar, Cindy (North-West University; Centre for Text Technology (CTexT), 2023-12-12)
      Monolingual corpus for Tshivenḓa. The data is given as a single UTF-8 text file, with each segment on a newline.
    • Morphologically annotated corpus for isiNdebele 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ...
    • Morphologically annotated corpus for isiXhosa 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for isiZulu 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
    • Morphologically annotated corpus for Siswati 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...