Show simple item record

Autshumato English-Tshivenḓa Parallel Corpora
Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced from translated material and created by translating English sentences into Tshivenḓa. The data is given as two separate UTF-8 text files, with each aligned segment on a newline.
Sunny Gent
sunny.gent@nwu.ac.za
North-West University; Centre for Text Technology (CTexT)
Creative Commons Attribution 4.0 International
English; Tshivenda
McKellar, Cindy
Puttkammer, Martin; Gaustad, Tanja; Gent, Sunny; van Heerden, Jacques
Autshumato V; Aligned parallel corpora; Tshivenḓa
https://hdl.handle.net/20.500.12185/682
Text
Multilingual text corpora: Aligned
There are 110,367 English-Tshivenḓa segments, consisting of 2,000,657 English words and 2,527,789 Tshivenḓa words.
3.0 (Final)
9.74Mb
Autshumato
2024-03-27T08:27:23Z
2024-03-27T08:27:23Z
2023-12-12


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Resource Catalogue [349]
    A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.

Show simple item record