Browsing Resource Index by Title

NCHLT Setswana Speech Corpus

Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue

Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

NCHLT Setswana Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...

NCHLT Siswati Annotated Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

NCHLT Siswati Auxiliary Speech Corpus

Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue

The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...

NCHLT Siswati Lemmatiser

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatiser developed during the NCHLT Text project. \n\n Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...

NCHLT Siswati Named Entity Annotated Corpus

B.B. Malangwane, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

NCHLT Siswati Phrase Chunk Annotated Corpus

B.B. Malangwane, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...

NCHLT Siswati Speech Corpus

Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue

Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

NCHLT Siswati Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...

NCHLT South African Language Identifier

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

A graphical user interface and command line tool to automatically classify a document, paragraph, sentence or phrase as one of the eleven official South ...

NCHLT Speech II Corpus

Jaco Badenhorst, et al. (Meraka Institute, CSIR, 2016-05-09) ~ Resource Catalogue

The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and ...

NCHLT Tagger

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

A graphical user interface and command line tool to automatically annotate running text with one or more linguistic tags:\n* Part of Speech\n* Named ...

NCHLT Text Web Services

Roald Eiselen (SADiLaR; North-West University, 2018-03-01) ~ Resource Index

A web service that provides access to seven core technologies in ten South African languages, including: * Tokenisers * Sentence separators * ...

NCHLT Tshivenda Morphological Decomposer

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Morphological decomposer developed during the NCHLT Text project.

NCHLT Tshivenda Annotated Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

NCHLT Tshivenda Auxiliary Speech Corpus

Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue

The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...

NCHLT Tshivenda Lemmatiser

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatiser developed during the NCHLT Text project. \n\n Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...

NCHLT Tshivenda Named Entity Annotated Corpus

S.L. Tshikota, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

NCHLT Tshivenda Phrase Chunk Annotated Corpus

S.L. Tshikota, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...

NCHLT Tshivenda Speech Corpus

Charl van Heerden, et al. (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue

Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.