Repository logoRepository logo
 

NCHLT Sesotho Annotated Text Corpora

dc.contact.emailMartin.Puttkammer@nwu.ac.za
dc.contact.nameMartin Puttkammer
dc.contributor.authorMartin Puttkammer
dc.contributor.authorMartin Schlemmer
dc.contributor.authorRuan Bekker
dc.date.accessioned2018-02-05T20:25:43Z
dc.date.accessioned2018-03-05T17:47:02Z
dc.date.available2018-02-05T20:25:43Z
dc.date.available2018-03-05T17:47:02Z
dc.date.issued2014-05-30
dc.descriptionLemmatised, part of speech tagged and morphologically analysed corpora developed during the NCHLT Text project.
dc.format.mediumToken and annotation in separate columns
dc.format.mediumxls
dc.format.mediumLARA2
dc.identifier.citationEiselen, E.R. & Puttkammer, M.J. 2014. Developing text resources for ten South African languages. (In Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland. p. 3698-3703)
dc.identifier.islrn384-256-807-951-3
dc.identifier.urihttps://hdl.handle.net/20.500.12185/332
dc.language.isosot
dc.languagesSesotho
dc.media.categoryMonolingual text corpora: Annotated
dc.media.typeText
dc.projectNCHLT Text
dc.publisherNorth-West University
dc.publisherCentre for Text Technology (CTexT)
dc.rights.licenseCreative Commons Attribution 2.5 South Africa License: http://creativecommons.org/licenses/by/2.5/za/legalcode
dc.software.requirementsSpreadsheet software required for xls versions; LARA2 required for LARA2 versions.
dc.sourceBased on documents from the South African government domain crawled from gov.za websites and collected from various language units.
dc.stratumDetails provided in documentation.
dc.titleNCHLT Sesotho Annotated Text Corpora
dc.typeData
dc.version1
local.collection.primaryResource Catalogue
local.collection.secondaryResource Index

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
annotatedcorpora.nchlt.st.zip
Size:
27.43 MB
Format:
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.