A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Recent Submissions

  • Autshumato Monolingual Setswana Corpus 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
  • Autshumato Monolingual Sesotho Corpus 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Monolingual corpus for Sesotho. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
  • Autshumato Monolingual Sepedi Corpus 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Monolingual corpus for Sepedi. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
  • Autshumato Monolingual isiZulu Corpus 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Monolingual corpus for isiZulu. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
  • Autshumato Monolingual Afrikaans Corpus 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...
  • Autshumato English-Xitsonga Parallel Corpora 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Aligned parallel corpora for the language pair English-Xitsonga. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
  • Autshumato English-Setswana Parallel Corpora 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Aligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
  • Autshumato English-Sesotho Parallel Corpora 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Aligned parallel corpora for the language pair English-Sesotho. The data is given as two separate UTF-8 text files, with each aligned segment on a ...
  • Autshumato English-Sepedi Paralle Corpora 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Aligned parallel corpora for the language pair English-Sepedi. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. ...
  • Autshumato English-isiZulu Parallel Corpora 

    McKellar, Cindy (CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
    Aligned parallel corpora for the language pair English-isiZulu. The data is given as two separate UTF-8 text files, with each aligned segment on a ...

View more