Repository logoRepository logo
 

Autshumato English-Sepedi Parallel Corpora

Loading...
Thumbnail Image

Deposit Licenses

Date

2022-09-30

Authors

McKellar, Cindy

Journal Title

Journal ISSN

Volume Title

Publisher

CTexT® (Centre for Text Technology, North-West University)

Abstract

Description

Aligned parallel corpora for the language pair English-Sepedi. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for.

Citation

License

Creative Commons Attribution 4.0 International

Collections

Verification status

Level 0