Repository logoRepository logo
 

NCHLT English Text Corpora

dc.contact.emailMartin.Puttkammer@nwu.ac.za
dc.contact.nameMartin Puttkammer
dc.contributor.authorMartin Puttkammer
dc.contributor.authorMartin Schlemmer
dc.contributor.authorWikus Pienaar
dc.contributor.authorRuan Bekker
dc.date.accessioned2018-02-05T20:22:41Z
dc.date.accessioned2018-03-05T17:45:55Z
dc.date.available2018-02-05T20:22:41Z
dc.date.available2018-03-05T17:45:55Z
dc.date.issued2016-09-09
dc.descriptionCollection consisting of a clean corpus, lexicon, frequency list and named-entity lists developed during the NCHLT Text project.
dc.format.extent9.47 Mb
dc.format.mediumText
dc.format.mediumUTF8
dc.identifier.citationEiselen, E.R. & Puttkammer, M.J. 2014. Developing text resources for ten South African languages. (In Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland. p. 3698-3703)
dc.identifier.islrn481-998-928-542-6
dc.identifier.urihttps://hdl.handle.net/20.500.12185/301
dc.language.isoeng
dc.languagesEnglish
dc.media.categoryMonolingual text corpora: Unannotated
dc.media.typeText
dc.projectNCHLT Text
dc.publisherNorth-West University
dc.publisherCentre for Text Technology (CTexT)
dc.rights.licenseCreative Commons Attribution 2.5 South Africa License: http://creativecommons.org/licenses/by/2.5/za/legalcode
dc.sourceBased on documents from the South African government domain crawled from gov.za websites and collected from various language units.
dc.stratumDetails provided in documentation.
dc.titleNCHLT English Text Corpora
dc.typeData
dc.version1
local.collection.primaryResource Catalogue
local.collection.secondaryResource Index

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
corpora.nchlt.en.zip
Size:
9.47 MB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.