Search
Now showing items 281-290 of 345
WAT quotation collection
(N/A, 2022-10-14)
Collection of short quotations/excerpts from a variety of books (fiction, non-fiction & academic).
Linguistically enriched corpora for conjunctively written South African languages
(North-West University, Centre for Language Technology (CTexT), 2021-09)
This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family ...
NCHLT isiNdebele Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
NCHLT Xitsonga Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
NCHLT Xitsonga FLAIR-backward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
Multilingual Illustrated Dictionary with interactive games
(Centre for Text Technology (CTexT); Pharos Dictionaries, 2013-07-01) ~ - Resource Index
Multilingual Illustrated Dictionary with interactive games and pronunciation for 7 of SA's official languages
Pretoria Sepedi Corpus POS tagged
(Department of African Languages - University of Pretoria, 2015-01-27) ~ - Resource Index
The tagged Pretoria Sepedi Corpus for part-of-speech (POS) tagging. For grammtical anlysis morphological analysis , lexical , syntax
NCHLT Sesotho Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...
NCHLT isiXhosa Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
Autshumato Monolingual Afrikaans Corpus
(CTexT® (Centre for Text Technology, North-West University), 2022-09-30)
Monolingual corpus for Afrikaans. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and ...