Filter by:

Now showing items 61-80 of 389

Filter options

    • Autshumato PDF Text Extractor 

      Wildrich Fourie (North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ Resource Catalogue
      Utility application for extracting text out of a PDF document. The pages can also be extracted as images.
    • Autshumato Sesotho sa Leboa-English Translation Memory 

      Cindy McKellar, et al. (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue
      Translation memory from Sesotho sa Leboa to English (EN-GB), in the government domain for use in the Autshumato ITE application.
    • Autshumato Setswana Monolingual Corpora 

      Cindy McKellar (North-West University; Centre for Text Technology (CTexT), 2016-10-28) ~ Resource Catalogue
      Setswana monolingual corpus as a deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a new line.
    • Autshumato Text Anonymiser 

      Martin Schlemmer, et al. (North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ Resource Catalogue
      Anonymises text by classifying and replacing sensitive information such as person names, business names, place names, monetary values, phone numbers, ...
    • Autshumato TMS 

      Martin Schlemmer, et al. (North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~ Resource Catalogue
      Terminology Management System. Web application used by Terminologists and Administrators to capture, edit and export terminology.
    • Autshumato TMX Integrator 

      Martin Schlemmer, et al. (North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ Resource Catalogue
      Utility to merge multiple translation memories over a network using Subversion
    • Autshumato Xitsonga Frequency Word List 

      Wikus Pienaar, et al. (North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~ Resource Catalogue
      A list of the most frequent Xitsonga words as deliverable of the Autshumato project.
    • Autshumato Xitsonga Monolingual Corpora 

      Wikus Pienaar, et al. (North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~ Resource Catalogue
      Xitsonga monolingual corpus as deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a newline.
    • Bambara Monolingual Children First Language Acquisition (Babbling & First Words) 

      CISSE, Ibrahima Abdoul Hayou (Ibrahima Abdoul Hayou CISSE, 2010)
      Dataset contains videos of children interacting with caregivers. Languages included: Bambara/Bamanakan/Dioula/Mande
    • Bilingual English-isiXhosa corpus 

      McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue
      Aligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a ...
    • Boomerang v1.0 

      Unknown author (, 2013-07-01) ~ Resource Index
      Performs outbound call campaigns and executes Javascript-scripted IVR callflows especially for marketing companies companies with extensive client ...
    • Bukantswe Sesotho-English Bilingual Dictionary 

      J. A. K. Olivier (North-West University, 2016-07-07) ~ Resource Catalogue
      Bilingual English-Sesotho dictionary. This dataset represents a basic Sesotho dictionary compiled in the creation of a Sesotho language resource. The ...
    • Calomo 

      Menno van Zaanen (North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ Resource Index
      Calomo is a hyphenator for Afrikaans, which can be implemented in any NLP system. It takes as input a string, and produces as output an analysed string, ...
    • CGE's Afrikaans Gender Terminology List 

      Commission for Gender Equality (CGE), et al. (Commission for Gender Equality (CGE), 2021-04)
      CGE's Afrikaans Gender Terminology List is a list of terms, either words or phrases, related to the promotion of gender equality. All 436 words or phrases ...
    • CKarma 

      Unknown author (North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ Resource Index
      CKarma is a compound analyser for Afrikaans, to be used for the detection of word boundaries within compounds. It takes as input a string, and produces ...
    • Code-switching among bilingual Afrikaans-Dutch children 

      Rabe, Monique (North-West University, 2021)
      To describe the code-switching in the speech of pre-school, bilingual Afrikaans-Dutch children, novel language data of eight preschool children between ...
    • Combination Tagger 

      Unknown author (North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ Resource Index
      The combination tagger framework uses MBT, SVM, MXPOST and TnT. Each tagger receives a weight by which it can vote for a tag.
    • CompanyCall v1.0 

      Unknown author (, 2013-07-01) ~ Resource Index
      Routes calls based on spoken company name, south african names, ses ASR name re ??, which domain) directory assistance, 08606companycall.co.za.
    • Corpus of multilingual code-switched soap opera speech 

      van der Westhuizen, Ewald, et al. (Stellenbosch University, 2020-02-28)
      The corpus comprises 26.9 hours of annotated multilingual speech that contains examples of code-switching in isiZulu, isiXhosa, Setswana, Sesotho and ...
    • CorpusCatcher 

      Unknown author (Translate.org.za, 2015-01-28) ~ Resource Index
      Corpus Catcher is a tool that is designed to crawl the web to retrieve data for inclusion in a corpus. It makes use of seed documents/wordlists to ...