Search

Now showing items 11-20 of 63

NCHLT Sesotho Speech Corpus

Charl van Heerden; Etienne Barnard; Jaco Badenhorst; Marelie Davel; Alta de Waal (Meraka Institute, CSIR; North-West University, 2014-07-08) ~ Resource Catalogue

Orthographically transcribed broadband speech corpus of approximately 56 hours, including a test suite of 8 speakers.

COVID-19 Multilingual Terminology

City of Tshwane; South African Centre for Digital Language Resources (SADiLaR); Department of Science and Innovation (DSI); Pan South African Language Board (PanSALB) (City of Tshwane; South African Centre for Digital Language Resources (SADiLaR); Department of Science and Innovation; Pan South African Language Board (PanSALB), 2021-07)

COVID-19 multilingual terminology list document in all the South African languages. The development of this terminology list was initiated by City of ...

South African Multilingual Proper Names (Multipron) Corpus

Etienne Barnard; Marelie Davel; Oluwapelumi Giwa; Nadia Barnard; Jean-Pierre Martens; Derik Thirion (Molo Afrika Speech Technologies, 2013-10-03) ~ Resource Catalogue

Audio, orthographic and auditory verified broad phonemic transcriptions of proper names in four languages, produced by speakers of the same four languages.

Lara2

Martin Puttkammer; Martin Schlemmer (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Tool for annotating texts with lemma, part of speech and morphological analysis information

Lwazi Sesotho ASR corpus

Charl van Heerden; Etienne Barnard; Jaco Badenhorst; Marelie Davel (Meraka Institute, CSIR, 2013-04-02) ~ Resource Catalogue

Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.

NCHLT Sesotho Annotated Text Corpora

Martin Puttkammer; Martin Schlemmer; Ruan Bekker (North-West University; Centre for Text Technology (CTexT), 2014-05-30) ~ Resource Catalogue

Lemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.

High quality TTS data for four South African languages (af, st, tn, xh)

Unknown author (Google; North-West University, 2017) ~ Resource Catalogue

This data set contains multi-speaker TTS high quality transcribed audio data for four languages of South Africa: Afrikaans, Sesotho, Setswana and isiXhosa. ...

Lwazi II Sesotho TTS Corpus

Daniel van Niekerk; Georg Schlünz (Meraka Institute, CSIR; North-West University, 2015-11-20) ~ Resource Catalogue

Orthographic and phonemically aligned transcriptions.

NCHLT Sesotho GloVe embeddings

Roald Eiselen (North-West University; Centre for Text Technology (CTexT), 2023-05-01)

Static word embedding model based on the Global Vectors architecture (Pennington et al., 2014). The embeddings provide real-valued vector representations ...

Corpus of multilingual code-switched soap opera speech

van der Westhuizen, Ewald; Niesler, Thomas (Stellenbosch University, 2020-02-28)

The corpus comprises 26.9 hours of annotated multilingual speech that contains examples of code-switching in isiZulu, isiXhosa, Setswana, Sesotho and ...

View previous page
1
2
3
4
5
. . .
7
View next page