Autshumato English-Tshivenḓa Parallel Corpora

Creative Commons Attribution 4.0 International: https://creativecommons.org/licenses/by/4.0/deed.enMcKellar, CindyPuttkammer, MartinGaustad, TanjaGent, Sunnyvan Heerden, Jacques2024-03-272024-03-272023-12-12https://hdl.handle.net/20.500.12185/682Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced from translated material and created by translating English sentences into Tshivenḓa. The data is given as two separate UTF-8 text files, with each aligned segment on a newline.TxtThere are 110,367 English-Tshivenḓa segments, consisting of 2,000,657 English words and 2,527,789 Tshivenḓa words.Autshumato VAligned parallel corporaTshivenḓaAutshumato English-Tshivenḓa Parallel Corpora9.74Mb