Autshumato Monolingual Setswana Corpus

Creative Commons Attribution 4.0 InternationalMcKellar, CindyGaustad Van Zaanen, TanjaPuttkammer, MartinGent, Sunny2022-12-152022-12-152022-09-30https://hdl.handle.net/20.500.12185/584Monolingual corpus for Setswana. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for.TxtSetswana Segments: 268 615 Setswana Words: 5 205 832Text; UTF8AutshumatoSetswanaAutshumato Monolingual Setswana Corpus10.0 Mb (zipped)