Search
Now showing items 1-10 of 165
Format Normaliser 1.0.
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
Normalises input files to txt, utf8, replaces smart quotes with straight quotes, removes empty lines, etc.
African Speech Technology Coloured-Afrikaans Speech Corpus
(North-West University; Stellenbosch University; University of Transkei; University of Free State (Qwa-Qwa campus); Rhodes University; University of KwaZulu-Natal; University of Western Cape, 2014-12-11) ~ - Resource Catalogue
African Speech Technology speech and transcription data for the Coloured-Afrikaans database. The "speech" directory contains Afrikaans speech as spoken ...
NCHLT Optical Character Recognition for South African Languages
(North-West University; Centre for Text Technology (CTexT), 2017-02-23) ~ - Resource Catalogue
An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure ...
AuCoPro Splitting Dataset
(North-West University; Centre for Text Technology (CTexT); Tilburg Centre for Cognition and Communication, 2015-01-07) ~ - Resource Catalogue
The AuCoPro Splitting dataset contains compounds annotated with their compound boundaries and linking morphemes for Afrikaans and Dutch.
Ragel
(North-West University; Centre for Text Technology (CTexT), 2015-01-30) ~ - Resource Index
Ragel was developed by using traditional methods for stemming/lemmatisation (i.e. affix stripping), and consists of language-specific rules for identifying ...
Pharos Speltoetser en Woordafbreker
(Pharos Dictionaries, 2013-07-01) ~ - Resource Index
Corrects typing and spelling errors and hyphenate words correctly
UNISA Multilingual Corpus
(University of South Africa, 2018-02-28) ~ - Resource Index
The resource comprises a diverse selection of TEI P5 marked up documents from institutional origin, written in Tswana, Afrikaans and English. This is ...
NCHLT Tagger
(North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ - Resource Catalogue
A graphical user interface and command line tool to automatically annotate running text with one or more linguistic tags:\n* Part of Speech\n* Named ...
Afrikaans multi-speaker TTS corpus
(MuST, NWU, 2018-02-27) ~ - Resource Index
The aim of this corpus was to investigate the implementation of a high-quality TTS system using multiple voices recorded using a low-cost process (i.e. ...
Phonetic aligner
(Meraka Institute, CSIR, 2013-07-01) ~ - Resource Index
Scripts for automatic phonetic alignment of speech corpora using hidden markov models (HMMs).