DSI LogoSADiLaR Logo
Clarin-ZA Logo
View Item 
  •   SADiLaR
  • Language Resource Management Agency
  • Resource Catalogue
  • View Item
  •   SADiLaR
  • Language Resource Management Agency
  • Resource Catalogue
  • View Item
    • Login
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Search form

    Browse

    All of SADiLaR

    Communities & CollectionsTitleProjectMedia type

    This Collection

    TitleProjectMedia type

    Corpus of multilingual code-switched soap opera speech

    Thumbnail
    Download
    Full archive of the speech and the examples of code-switching (5lang) (2.747Gb)
    MD5: bb9d6bd19c65183615e44b442f22e4a2

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    Meta-data and transcriptions of mixed language (5lang) data (31.72Mb)
    MD5: 557915923ed0bdad73d377ffd1f139fb

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    Meta-data and transcriptions of English/Sesotho code-switch data (3.685Mb)
    MD5: 6817930e35e0131fca08cb41174c8a41

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    Meta-data and transcriptions of English/Setswana code-switch data (3.764Mb)
    MD5: 3370fd730e11b94e9b4f29d2a2434962

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    Meta-data and transcriptions of English/isiXhosa code-switch data (3.858Mb)
    MD5: d6fd7c59880ae411304d20f266e9a977

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    Meta-data and transcriptions of English/isiZulu code-switch data (5.446Mb)
    MD5: 4cf0540eac4608c711170c77b915f78d

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    Development and test sets used to generate the results reported in the LREC 2020 paper (15.22Kb)
    MD5: df8b9fdd4a2e1c4680face3847ad752f

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    English/Sesotho code-switch data (298.0Mb)
    MD5: 05021f2081a6bc720d6079e3d38e98b1

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    English/Setswana code-switch data (299.2Mb)
    MD5: 6d019cc8d92e819cd61499ce5dd077ae

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    English/isiXhosa code-switch data (322.0Mb)
    MD5: ba6d3ca27a169f32f42bcb55b7e99005

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    English/isiZulu code-switch data (565.1Mb)
    MD5: 74261fcdce0bda277ee0158a54e5f30d

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    README file that describes format in which data was delivered to DAC (5.143Kb)
    MD5: 007f13be2baedb0ee3fb0679ccb5b72d

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    Test Readme (11bytes)
    MD5: 7c5d7e68286b7a8411d0e3620302f885

    License agreement

    By downloading this resource I accept and agree to the terms of use and the associated license conditions under which the resource is distributed.

    URI
    https://hdl.handle.net/20.500.12185/545
    Collections
    • Resource Catalogue [251]
    Author(s)
    van der Westhuizen, Ewald
    Niesler, Thomas
    Metadata
    Show full item record
    Description
    The corpus comprises 26.9 hours of annotated multilingual speech that contains examples of code-switching in isiZulu, isiXhosa, Setswana, Sesotho and English. The speech was obtained from South African soap operas. Code-switching between English and one of the Bantu languages is by far most prevalent in the data. Although not very common, switches between the Bantu languages themselves also occur. An initial attempt to align the audio extracted from soap opera episodes with the corresponding scripts revealed that actors very often perform ad lib. The speech and the examples of code-switching it contains can therefore be considered to be spontaneous.
    Contact person
    Thomas Niesler
    Contact person's e-mail address
    trn@sun.ac.za
    Publisher(s)
    Stellenbosch University
    License
     

    Copyright © 2018  SADiLaR. All Rights Reserved.
    Contact Us | Send Feedback
     

     


    Copyright © 2018  SADiLaR. All Rights Reserved.
    Contact Us | Send Feedback