Search

Now showing items 31-40 of 89

Autshumato Text Anonymiser

Martin Schlemmer; Wikus Pienaar; Wildrich Fourie; Ismail Lavangee; Cindy McKellar; Gordon Matthews; Marissa Griesel (North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ Resource Catalogue

Anonymises text by classifying and replacing sensitive information such as person names, business names, place names, monetary values, phone numbers, ...

CTexT Multilingual Text Corpora

Unknown author (North-West University; Centre for Text Technology (CTexT), 2015-02-03) ~ Resource Index

Document level aligned corpora for machine translation purposes.

NCHLT Xitsonga Named Entity Annotated Corpus

N.C.P. Golele; X.E. Mabaso; Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

NCHLT Xitsonga Annotated Text Corpora

Martin Puttkammer; Martin Schlemmer; Ruan Bekker (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

NCHLT Xitsonga word2vec-CBOW embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides ...

NCHLT Xitsonga fastText-Skipgram embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...

Lwazi Xitsonga Pronunciation Dictionary

Marelie Davel (Meraka Institute, CSIR, 2013-04-01) ~ Resource Catalogue

General phonemic pronunciations for frequently occurring words in SA languages. Dictionaries were developed to be practically usable for speech technology ...

NCHLT Xitsonga Phrase Chunk Annotated Corpus

N.C.P. Golele; Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue

Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...

NCHLT Xitsonga Lemmatiser

Martin Puttkammer; Martin Schlemmer; Ruan Bekker (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatiser developed during the NCHLT Text project. \n\n Available in the Readme.txt - Input format: Text data (encoding: UTF8 without BOM), one ...

Human Language Technology Audit 2017/18

Moors, Carmen; Wilken, Ilana; Gumede, Tebogo; Calteaux, Karen (CSIR, 2018-08-31)

This document reports on all work conducted in the 2017/18 Audit of human language technology (HLT) resources available in South Africa project. The ...

View previous page
1
2
3
4
5
6
7
. . .
9
View next page