Browsing Resource Catalogue by Title
Filter by:
Now showing items 44-63 of 350
-
Autshumato Monolingual English Corpus
(CTexT® (Centre for Text Technology, North-West University), 2023-10-30)Monolingual corpus for South African English. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically ... -
Autshumato Monolingual Tshivenḓa Corpus
(North-West University; Centre for Text Technology (CTexT), 2023-12-12)Monolingual corpus for Tshivenḓa. The data is given as a single UTF-8 text file, with each segment on a newline. -
Autshumato Multilingual Word and Phrase Translations
(North-West University; Centre for Text Technology (CTexT), 2016-01-20) ~Resource Catalogue Word and phrase lists aligned from English to the other official South African languages. -
Autshumato PDF Text Extractor
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~Resource Catalogue Utility application for extracting text out of a PDF document. The pages can also be extracted as images. -
Autshumato Sesotho sa Leboa-English Translation Memory
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~Resource Catalogue Translation memory from Sesotho sa Leboa to English (EN-GB), in the government domain for use in the Autshumato ITE application. -
Autshumato Setswana Monolingual Corpora
(North-West University; Centre for Text Technology (CTexT), 2016-10-28) ~Resource Catalogue Setswana monolingual corpus as a deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a new line. NOTE: ... -
Autshumato Text Anonymiser
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~Resource Catalogue Anonymises text by classifying and replacing sensitive information such as person names, business names, place names, monetary values, phone numbers, ... -
Autshumato TMS
(North-West University; Centre for Text Technology (CTexT), 2013-06-19) ~Resource Catalogue Terminology Management System. Web application used by Terminologists and Administrators to capture, edit and export terminology. -
Autshumato TMX Integrator
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~Resource Catalogue Utility to merge multiple translation memories over a network using Subversion -
Autshumato Xitsonga Frequency Word List
(North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~Resource Catalogue A list of the most frequent Xitsonga words as deliverable of the Autshumato project. -
Autshumato Xitsonga Monolingual Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-12-12) ~Resource Catalogue Xitsonga monolingual corpus as deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a newline. NOTE: ... -
Bilingual English-isiXhosa corpus
(North-West University - Centre for Text Technology (CTexT), 2019-11-30) ~Resource Catalogue Aligned parallel corpora for the following language pair: English-isiXhosa. The data is given as two separate UTF-8 text files, with each segment on a ... -
Bukantswe Sesotho-English Bilingual Dictionary
(North-West University, 2016-07-07) ~Resource Catalogue Bilingual English-Sesotho dictionary. This dataset represents a basic Sesotho dictionary compiled in the creation of a Sesotho language resource. The ... -
CGE's Afrikaans Gender Terminology List
(Commission for Gender Equality (CGE), 2021-04)CGE's Afrikaans Gender Terminology List is a list of terms, either words or phrases, related to the promotion of gender equality. All 436 words or phrases ... -
Core technologies for conjunctively written South African languages
(North-West University, Centre for Language Technology (CTexT), 2021-03-31)During this SADiLaR funded project, enriched corpora for the four official South African languages with a conjunctive orthography, i.e. isiNdebele ... -
Corpus of multilingual code-switched soap opera speech
(Stellenbosch University, 2020-02-28)The corpus comprises 26.9 hours of annotated multilingual speech that contains examples of code-switching in isiZulu, isiXhosa, Setswana, Sesotho and ... -
COVID-19 Multilingual Terminology
(City of Tshwane; South African Centre for Digital Language Resources (SADiLaR); Department of Science and Innovation; Pan South African Language Board (PanSALB), 2021-07)COVID-19 multilingual terminology list document in all the South African languages. The development of this terminology list was initiated by City of ... -
CTexT Afrikaans fastText CBoW String Embeddings
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans fastText CBoW String Embeddings is a 300 dimensional Afrikaans embedding model based on the Contunious Bag of Words fastText ... -
CTexT Afrikaans FLAIR Named Entity Recognition model
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans FLAIR Named Entity Recognition model is a neural NER model based on the FLAIR framework (Akbik et al. 2019), and includes Afrikaans ... -
CTexT Afrikaans FLAIR Part of Speech tagger model
(Centre for Text Technology (CTexT), 2022-01-10)The CTexT Afrikaans FLAIR Part of Speech tagger model is a neural part of speech tagger model based on the FLAIR framework (Akbik et al. 2019), and ...