Creative Commons Attribution 4.0 International (CC-BY 4.0): https://www.creativecommons.org/licenses/by/4.0/McKeller, CindyGaustad Van Zaanen, TanjaPuttkammer, MartinGent, Sunny2024-11-032024-11-032023-10-30https://hdl.handle.net/20.500.12185/686Monolingual corpus for South African English. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for..txtEnglish Segments: 8 832 451 English Words: 188 252 040Text; UTF8Autshumato;EnglishAutshumato Monolingual English Corpus438 Mb