Filter by:

Now showing items 310-329 of 349

Filter options

    • NCHLT Xitsonga Named Entity Annotated Corpus 

      N.C.P. Golele, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue
      Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.
    • NCHLT Xitsonga Phrase Chunk Annotated Corpus 

      N.C.P. Golele, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue
      Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
    • NCHLT Xitsonga RoBERTa language model 

      Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
      Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ...
    • NCHLT Xitsonga Speech Corpus 

      Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
      Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
    • NCHLT Xitsonga Text Corpora 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
    • NCHLT Xitsonga word2vec-CBOW embeddings 

      Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
      Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides ...
    • NCHLT Xitsonga word2vec-Skipgram embeddings 

      Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)
      Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ...
    • NCHLT-inlang Pronunciation Dictionaries 

      Marelie Davel (Meraka Institute, CSIR; North-West University, 2014-07-04) ~ Resource Catalogue
      Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations ...
    • PHONAAS 

      Wikus Pienaar, et al. (North-West University; Centre for Text Technology (CTexT), 2015-06-30) ~ Resource Catalogue
      PHONAAS is a graphical user interface (GUI) tool, written in Perl and GTK2, using the R programming language and PRAAT to extract vowel formant data.
    • POS annotated corpus in 5 different genres for Sepedi 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      This corpus contains POS annotated data in 5 different genres for Sepedi. The text types included are: - CAPS gr12 (Academic) - https://www.educ ...
    • POS annotated corpus with 5 different text types for isiZulu 

      Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
      This is a POS annotated corpus with 5 different text types for isiZulu. The text types included are: - CAPS gr12 (Academic) - https://www.educat ...
    • Read Afrikaans Normal/ Read Afrikaans Fast 

      Wissing, Daan (Centre for Text Technology, North-West University, 2019-05-28) ~ Resource Catalogue
      The corpus contains speech of 127 mother tongue speakers of Afrikaans. Every speaker was asked to read a text fragment from a book or a newspaper (about ...
    • Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad 

      Coetzer, G.C., et al. (North-West University, 2019-02-18) ~ Resource Catalogue
      Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad
    • SADE Municipality Hotline IVR Prompts 

      Charl van Heerden, et al. (North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC, 2015-09-07) ~ Resource Catalogue
      Audio and corresponding transcriptions for the SADE Municipality Hotline IVR prompts in English, Sesotho and isiZulu. The English SADE municipality ...
    • SADE v.1.0 Platform 

      Charl van Heerden, et al. (North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC, 2015-09-07) ~ Resource Catalogue
      End-to-end directoy enquiries application (using Asterisk, UniMRPC and Kaldi). The municipality hotline example is implemented as an Asterisk Gateway ...
    • Sepedi Custom Dictionary for Government Domain 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ Resource Catalogue
      Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
    • Sesotho Custom Dictionary for Government Domain 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ Resource Catalogue
      Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
    • Sesotho function word speech data 

      Wissing, Daan (Centre for Text Technology, North-West University, 2019-05-28) ~ Resource Catalogue
      The primary aim of this speech data set was to study the role of tone in the function word "ke" in the minimal pairs "ke motho" and in the function word ...
    • Sesotho Genre Classification Corpus 

      Gerhard van Huyssteen, et al. (Trifonius, 2013-06-19) ~ Resource Catalogue
      Contains training and testing data for Genre Classification for Sesotho.
    • Sesotho sa Leboa Genre Classification Corpus 

      Gerhard van Huyssteen, et al. (Trifonius, 2013-06-19) ~ Resource Catalogue
      Contains training and testing data for Genre Classification for Sesotho sa Leboa.