Autshumato Monolingual English Corpus
Title | Autshumato Monolingual English Corpus |
Description | Monolingual corpus for South African English. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for. |
Contact name | Sunny Gent |
Contact email | sunny.gent@nwu.ac.za |
Publisher(s) | CTexT® (Centre for Text Technology, North-West University) |
License | Creative Commons Attribution 4.0 International (CC-BY 4.0): https://www.creativecommons.org/licenses/by/4.0/ |
Language(s) | English |
Author(s) | McKeller, Cindy |
Contributor | Gaustad Van Zaanen, Tanja; Puttkammer, Martin; Gent, Sunny |
Subject | Autshumato;; English |
URI | https://hdl.handle.net/20.500.12185/686 |
Media type | Text |
Media category | Monolingual corpus |
Format extent | English Segments: 8 832 451 English Words: 188 252 040 |
Version | 1.0 (Final) |
Format size | 438 Mb |
Format medium | Text; UTF8 |
Project | Autshumato VI |
Submit date | 2024-11-03T04:19:03Z |
Date available | 2024-11-03T04:19:03Z |
Date created | 2023-10-30 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [350]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.