Filter by:

Now showing items 195-214 of 232

Filter options

    • NCHLT Tshivenda Phrase Chunk Annotated Corpus 

      S.L. Tshikota, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue
      Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
    • NCHLT Tshivenda Speech Corpus 

      Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
      Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
    • NCHLT Tshivenda Text Corpora 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
    • NCHLT Xitsonga Morphological Decomposer 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Morphological decomposer developed during the NCHLT Text project.
    • NCHLT Xitsonga Annotated Text Corpora 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
    • NCHLT Xitsonga Auxiliary Speech Corpus 

      Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue
      The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
    • NCHLT Xitsonga Lemmatiser 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Lemmatiser developed during the NCHLT Text project.
    • NCHLT Xitsonga Named Entity Annotated Corpus 

      N.C.P. Golele, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue
      Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.
    • NCHLT Xitsonga Phrase Chunk Annotated Corpus 

      N.C.P. Golele, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue
      Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
    • NCHLT Xitsonga Speech Corpus 

      Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue
      Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
    • NCHLT Xitsonga Text Corpora 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue
      Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
    • NCHLT-inlang Pronunciation Dictionaries 

      Marelie Davel (Meraka Institute, CSIR; North-West University, 2014-07-04) ~ Resource Catalogue
      Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations ...
    • PHONAAS 

      Wikus Pienaar, et al. (North-West University; Centre for Text Technology (CTexT), 2015-06-30) ~ Resource Catalogue
      PHONAAS is a graphical user interface (GUI) tool, written in Perl and GTK2, using the R programming language and PRAAT to extract vowel formant data.
    • Read Afrikaans Normal/ Read Afrikaans Fast 

      Wissing, Daan (Centre for Text Technology, North-West University, 2019-05-28) ~ Resource Catalogue
      The corpus contains speech of 127 mother tongue speakers of Afrikaans. Every speaker was asked to read a text fragment from a book or a newspaper (about ...
    • SADE Municipality Hotline IVR Prompts 

      Charl van Heerden, et al. (North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC, 2015-09-07) ~ Resource Catalogue
      Audio and corresponding transcriptions for the SADE Municipality Hotline IVR prompts in English, Sesotho and isiZulu. The English SADE municipality ...
    • SADE v.1.0 Platform 

      Charl van Heerden, et al. (North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC, 2015-09-07) ~ Resource Catalogue
      End-to-end directoy enquiries application (using Asterisk, UniMRPC and Kaldi). The municipality hotline example is implemented as an Asterisk Gateway ...
    • Sepedi Custom Dictionary for Government Domain 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ Resource Catalogue
      Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
    • Sesotho Custom Dictionary for Government Domain 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2013-02-22) ~ Resource Catalogue
      Word list developed as a custom dictionary for use in the spelling checkers as part of the spelling checker project for the Department of Arts and ...
    • Sesotho function word speech data 

      Wissing, Daan (Centre for Text Technology, North-West University, 2019-05-28) ~ Resource Catalogue
      The primary aim of this speech data set was to study the role of tone in the function word "ke" in the minimal pairs "ke motho" and in the function word ...
    • Sesotho Genre Classification Corpus 

      Gerhard van Huyssteen, et al. (Trifonius, 2013-06-19) ~ Resource Catalogue
      Contains training and testing data for Genre Classification for Sesotho.