Browsing Resource Index by Title

NCHLT Text Web Services

Roald Eiselen (SADiLaR; North-West University, 2018-03-01) ~ Resource Index

A web service that provides access to seven core technologies in ten South African languages, including: * Tokenisers * Sentence separators * ...

NCHLT Tshivenda Morphological Decomposer

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Morphological decomposer developed during the NCHLT Text project.

NCHLT Tshivenda Annotated Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

NCHLT Tshivenda Auxiliary Speech Corpus

Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue

The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...

NCHLT Tshivenda Lemmatiser

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatiser developed during the NCHLT Text project. \n\n Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...

NCHLT Tshivenda Named Entity Annotated Corpus

S.L. Tshikota, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

NCHLT Tshivenda Phrase Chunk Annotated Corpus

S.L. Tshikota, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...

NCHLT Tshivenda Speech Corpus

Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue

Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

NCHLT Tshivenda Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...

NCHLT Xitsonga Morphological Decomposer

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Morphological decomposer developed during the NCHLT Text project.

NCHLT Xitsonga Annotated Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

NCHLT Xitsonga Auxiliary Speech Corpus

Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue

The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...

NCHLT Xitsonga Lemmatiser

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatiser developed during the NCHLT Text project. \n\n Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...

NCHLT Xitsonga Named Entity Annotated Corpus

N.C.P. Golele, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

NCHLT Xitsonga Phrase Chunk Annotated Corpus

N.C.P. Golele, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...

NCHLT Xitsonga Speech Corpus

Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue

Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

NCHLT Xitsonga Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...

NCHLT-inlang Pronunciation Dictionaries

Marelie Davel (Meraka Institute, CSIR; North-West University, 2014-07-04) ~ Resource Catalogue

Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations ...

NHN Zulu corpora

Unknown author (University of the Witwatersrand, 2015-01-07) ~ Resource Index

A first step to building a corpus of POS-annotated Zulu texts.

NoteTaker (vSep2009)

Unknown author (Meraka Institute, CSIR, 2013-07-01) ~ Resource Index

Replaces a number of dedicated devices for the blind. The Notetaker is really a communication and computing device for the blind and visually impaired ...