Search
Now showing items 21-30 of 165
Autshumato PDF Text Extractor
(North-West University; Centre for Text Technology (CTexT), 2013-06-20) ~ - Resource Catalogue
Utility application for extracting text out of a PDF document. The pages can also be extracted as images.
NCHLT Afrikaans RoBERTa language model
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned ...
Pharos 5-in-1 Dictionaries
(Pharos Dictionaries, 2013-07-01) ~ - Resource Index
Collection of three bilingual (Afr/Eng) dictionaries and two monolingual (Afr) dictionaries
NCHLT-inlang Pronunciation Dictionaries
(Meraka Institute, CSIR; North-West University, 2014-07-04) ~ - Resource Catalogue
Broad phonemic transcriptions for 15,000 generic words in each of 11 languages. Each dictionary has an associated rule set for generating pronunciations ...
NCHLT Afrikaans Text Corpora
(North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ - Resource Catalogue
Collection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed ...
Multilingual Soccer Terminology List
(Terminology Coordination Section of the National Language Service, Department of Arts and Culture, 2017-03-03) ~ - Resource Index
297 English source terms with their equivalents in the ten other official South African languages. On the eve of the 2010 FIFA World Cup, the list was ...
NCHLT Afrikaans fastText-Skipgram embeddings
(North-West University; Centre for Text Technology (CTexT), 2023-05-01)
Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued ...
PharosOnline
(Pharos Dictionaries, 2013-07-01) ~ - Resource Index
Collection of 18 subject and general dictionaries and word lists in Afrikaans and/or English
NCHLT Afrikaans Speech Corpus
(Meraka Institute, CSIR; North-West University, 2014-07-08) ~ - Resource Catalogue
Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.
Asterisk Nuance 1.4.
(Molo Afrika Speech Technologies, 2015-01-28) ~ - Resource Index
Integration of commercial Nuance speech-recognition engine to the Asterisk open-source platform.