Autshumato English-Tshivenḓa Parallel Corpora
Title | Autshumato English-Tshivenḓa Parallel Corpora |
Description | Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced from translated material and created by translating English sentences into Tshivenḓa. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. |
Contact name | Sunny Gent |
Contact email | sunny.gent@nwu.ac.za |
Publisher(s) | North-West University; Centre for Text Technology (CTexT) |
License | Creative Commons Attribution 4.0 International |
Language(s) | English; Tshivenda |
Author(s) | McKellar, Cindy |
Contributor | Puttkammer, Martin; Gaustad, Tanja; Gent, Sunny; van Heerden, Jacques |
Subject | Autshumato V; Aligned parallel corpora; Tshivenḓa |
URI | https://hdl.handle.net/20.500.12185/682 |
Media type | Text |
Media category | Multilingual text corpora: Aligned |
Format extent | There are 110,367 English-Tshivenḓa segments, consisting of 2,000,657 English words and 2,527,789 Tshivenḓa words. |
Version | 3.0 (Final) |
Format size | 9.74Mb |
Project | Autshumato |
Submit date | 2024-03-27T08:27:23Z |
Date available | 2024-03-27T08:27:23Z |
Date created | 2023-12-12 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [349]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.