A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Recent Submissions

  • Afrikaans text unit identification data 

    Puttkammer, Martin (Centre for Text Technology, North-West University, 2006) ~ Resource Catalogue
    This dataset was developed during a masters degree and used in the development of a text unit identifier capable of tagging sentences, named-entities, ...
  • Autshumato Machine Translation Evaluation Set 

    McKellar, Cindy Arlene (North-West University; Centre for Text Technology (CTexT); Department of Arts and Culture, South Africa, 2017-12-15) ~ Resource Catalogue
    Comparable evaluation data for use in automatic machine translation evaluations. The evaluation set consists of 500 sentences translated separately by ...
  • Qfrency TTS phone mappings 

    Unknown author (CSIR, 2018-03-02) ~ Resource Index
    TTS phone mappings between IPA, XSAMPA and our Qfrency internal format, standardised across all 11 SA languages. To be used in conjunction with the Lwazi ...
  • GF Miniature Resource for Tswana 

    Laurette Marais, Meraka, et al. (HLT Research Group, Meraka Institute, CSIR, 2018-03-06) ~ Resource Index
    This miniature resource grammar parses and generates main clause sentences in various tenses, moods and aspects in Tswana. The lexicon is limited, but ...
  • Qfrency TTS Afrikaans Maryna recordings 

    Unknown author (CSIR, 2018-03-07) ~ Resource Index
    Studio quality recordings of text-to-speech data in Afrikaans and some English utterances. Professional Afrikaans first language voice artist.
  • Qfrency TTS Afrikaans Kobus recordings 

    Unknown author (CSIR, 2018-03-07) ~ Resource Index
    Studio quality recordings of text-to-speech data in Afrikaans and some English utterances. Professional Afrikaans first language voice artist.
  • NCHLT Text Web Services 

    Roald Eiselen (SADiLaR; North-West University, 2018-03-01) ~ Resource Index
    A web service that provides access to seven core technologies in ten South African languages, including: * Tokenisers * Sentence separators * ...
  • Autshumato Machine Translation Web Service (MTWS) 

    Wildrich Fourie, et al. (Centre for Text Technology; North-West University, 2018-03-01) ~ Resource Index
    The MTWS is a unified interface through which anyone can gain access to the MT systems developed in the Autshumato project. It can provide sentence, ...
  • TsnMorph 

    Laurette Pretorius, et al. (University of South Africa, 2018-03-01) ~ Resource Index
    Finite-state morphological analyser for Tswana based on the Xerox toolkit and compatible with foma
  • ZulMorph 

    Laurette Pretorius, et al. (University of South Africa, 2018-03-01) ~ Resource Index
    Finite-state morphological analyser for Zulu based on the Xerox toolkit and compatible with foma

View more