DSI LogoSADiLaR Logo
Clarin-ZA Logo
View Item 
  •   SADiLaR
  • Language Resource Management Agency
  • Resource Index
  • View Item
  •   SADiLaR
  • Language Resource Management Agency
  • Resource Index
  • View Item
    • Login
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Search form

    Browse

    All of SADiLaR

    Communities & CollectionsTitleProjectMedia type

    This Collection

    TitleProjectMedia type

    Bilingual English-Siswati Corpus

    Thumbnail
    Download
    Language pair: English-SiSwati - four separate UTF-8 text files (9.543Mb)
    MD5: 457410fc477f26eb7916403a8c11ffbb

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    URI
    https://hdl.handle.net/20.500.12185/560
    Collections
    • Resource Index [409]
    Author(s)
    McKellar, Cindy
    Metadata
    Show full item record
    Description
    Aligned parallel corpora for the following language pair: English-SiSwati. The data is given as four separate UTF-8 text files, with each segment on a newline. Dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The dataset contains the following types of bilingual data: Translations from English to Siswati and crawled parallel data for English-Siswati. The dataset comprises a total of 114,839 segments with 2,002,293 English words and 1, 423,414 SiSwati words.
    Contact person
    Tanja Gaustad
    Contact person's e-mail address
    tanja.gaustad@nwu.ac.za
    Publisher(s)
    North-West University - Centre for Text Technology (CTexT)
    License
     

    Copyright © 2018  SADiLaR. All Rights Reserved.
    Contact Us | Send Feedback
     

     


    Copyright © 2018  SADiLaR. All Rights Reserved.
    Contact Us | Send Feedback