Browsing Resource Catalogue by Title

Filter by:

Now showing items 138-157 of 349

Monolingual isiXhosa corpus

McKellar, Cindy (North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~ Resource Catalogue

Monolingual corpus for isiXhosa. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data ...
Morphologically annotated corpus for isiNdebele

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ...
Morphologically annotated corpus for isiXhosa

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for isiZulu

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Sepedi

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Sepedi converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Sesotho

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Setswana

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Siswati

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Tshivenḓa

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Tshivenḓa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Morphologically annotated corpus for Xitsonga

Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)

NCHLT corpus of morphologically annotated tokens in Xitsonga converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
Multilingual Linguistic Terminology

Griesel, Marissa (UNISA, 2022-09-20)

Multilingual Linguistic Terminology Project Termbanks of Linguistic terminology for South African languages Version 1.0 https://linguistictermino ...
NCHLT Afrikaans Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
NCHLT Siswati Morphological Decomposer

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Morphological decomposer developed during the NCHLT Text project.
NCHLT Afrikaans Annotated Text Corpora

Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
NCHLT Afrikaans Auxiliary Speech Corpus

Febe de Wet, et al. (CSIR Meraka Institute; North-West University, 2019-06-01) ~ Resource Catalogue

The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ...
NCHLT Afrikaans fastText-CBoW embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ...
NCHLT Afrikaans fastText-Skipgram embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...
NCHLT Afrikaans FLAIR-backward embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
NCHLT Afrikaans FLAIR-forward embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ...
NCHLT Afrikaans GloVe embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Static word embedding model based on the Global Vectors architecture (Pennington et al., 2014). The embeddings provide real-valued vector representations ...

View previous page
View next page