Repository logoRepository logo
 

NCHLT Tshivenda Text Corpora

dc.contact.emailMartin.Puttkammer@nwu.ac.za
dc.contact.nameMartin Puttkammer
dc.contributor.authorMartin Puttkammer
dc.contributor.authorMartin Schlemmer
dc.contributor.authorWikus Pienaar
dc.contributor.authorRuan Bekker
dc.date.accessioned2018-02-05T20:25:55Z
dc.date.accessioned2018-03-05T17:47:55Z
dc.date.available2018-02-05T20:25:55Z
dc.date.available2018-03-05T17:47:55Z
dc.date.issued2014-05-30
dc.descriptionCollection of source text documents, genre classified text documents, raw corpus, clean corpus, lexicon, frequency list and named-entity lists developed during the NCHLT Text project.
dc.format.extent5.69 Mb
dc.format.mediumText
dc.format.mediumUTF8
dc.identifier.citationEiselen, E.R. & Puttkammer, M.J. 2014. Developing text resources for ten South African languages. (In Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland. p. 3698-3703)
dc.identifier.islrn450-604-191-615-2
dc.identifier.urihttps://hdl.handle.net/20.500.12185/357
dc.language.isoven
dc.languagesTshivenda
dc.media.categoryMonolingual text corpora: Unannotated
dc.media.typeText
dc.projectNCHLT Text
dc.publisherNorth-West University
dc.publisherCentre for Text Technology (CTexT)
dc.rights.licenseCreative Commons Attribution 2.5 South Africa License: http://creativecommons.org/licenses/by/2.5/za/legalcode
dc.sourceBased on documents from the South African government domain crawled from gov.za websites and collected from various language units.
dc.stratumDetails provided in documentation.
dc.titleNCHLT Tshivenda Text Corpora
dc.typeData
dc.version1
local.collection.primaryResource Catalogue
local.collection.secondaryResource Index

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
corpora.nchlt.ve.zip
Size:
5.7 MB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.