Browsing Resource Catalogue by Title
Filter by:
Now showing items 304-323 of 350
-
NCHLT Xitsonga Auxiliary Speech Corpus
(CSIR Meraka Institute; North-West University, 2019-06-01) ~Resource Catalogue The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in ... -
NCHLT Xitsonga fastText-CBoW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding ... -
NCHLT Xitsonga fastText-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ... -
NCHLT Xitsonga FLAIR-backward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ... -
NCHLT Xitsonga FLAIR-forward embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector ... -
NCHLT Xitsonga GloVe embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Static word embedding model based on the Global Vectors architecture (Pennington et al., 2014). The embeddings provide real-valued vector representations ... -
NCHLT Xitsonga Lemmatiser
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~Resource Catalogue Lemmatiser developed during the NCHLT Text project. \n\n Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ... -
NCHLT Xitsonga Named Entity Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~Resource Catalogue Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags. -
NCHLT Xitsonga Phrase Chunk Annotated Corpus
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~Resource Catalogue Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ... -
NCHLT Xitsonga RoBERTa language model
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ... -
NCHLT Xitsonga Speech Corpus
(Meraka Institute, CSIR; North-West University, 2014-07-08) ~Resource Catalogue Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers. -
NCHLT Xitsonga Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~Resource Catalogue Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ... -
NCHLT Xitsonga word2vec-CBOW embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides ... -
NCHLT Xitsonga word2vec-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector ... -
NCHLT-inlang Pronunciation Dictionaries
(Meraka Institute, CSIR; North-West University, 2014-07-04) ~Resource Catalogue Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations ... -
PHONAAS
(North-West University; Centre for Text Technology (CTexT), 2015-06-30) ~Resource Catalogue PHONAAS is a graphical user interface (GUI) tool, written in Perl and GTK2, using the R programming language and PRAAT to extract vowel formant data. -
POS annotated corpus in 5 different genres for Sepedi
(Centre for Text Technology (CTexT), 2024-01-31)This corpus contains POS annotated data in 5 different genres for Sepedi. The text types included are: - CAPS gr12 (Academic) - https://www.educ ... -
POS annotated corpus with 5 different text types for isiZulu
(Centre for Text Technology (CTexT), 2024-01-31)This is a POS annotated corpus with 5 different text types for isiZulu. The text types included are: - CAPS gr12 (Academic) - https://www.educat ... -
Read Afrikaans Normal/ Read Afrikaans Fast
(Centre for Text Technology, North-West University, 2019-05-28) ~Resource Catalogue The corpus contains speech of 127 mother tongue speakers of Afrikaans. Every speaker was asked to read a text fragment from a book or a newspaper (about ... -
Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad
(North-West University, 2019-02-18) ~Resource Catalogue Representations of epistemological certainty and ontological ambiguity in selected earlier works by Joseph Conrad