----------------------------------------------------------------------------------------------------------------------------------------------------------
README: Afrikaans morphological evaluation constructions dataset
Compiler: Benito Trollip (1)

        (1) Digital Humanities researcher specialising in Afrikaans
        SADiLaR, South African Centre for Digital Language Resources, 
        North-West University, South Africa
        benito.trollip@nwu.ac.za
        benmapstieks@gmail.com
	
Version: 2023-05-03
Language: Afrikaans

This dataset is available at https://repo.sadilar.org/handle/20.500.12185/566
----------------------------------------------------------------------------------------------------------------------------------------------------------
1. About the Afrikaans MECs dataset
2. License
----------------------------------------------------------------------------------------------------------------------------------------------------------

1. The Afrikaans MECs dataset

A dataset of Afrikaans morphological evaluative constructions (MECs) and their word 
frequency classes (N-values). The MECs have been compiled using extracted 
constructions from the corpus collections accessible through the Virtual Institute for 
Afrikaans (VivA). The files are grouped in affixoids, compounds, affixes and other typed 
of MECs directories.This directories and subdirectories contain xlsx-files that is described
 in detail in the doctoral thesis titled "Morfologiese evalueringskonstruksies in Afrikaans" 
 (Trollip, 2022). The thesis is available  for download at http://hdl.handle.net/10394/42016.

The main directory contains four subdirectories, some with more subdirectories, with xlsx-files. 
Some of the xlsx-files contain different tabs as well.
The directory structure of the dataset:
    Affixes
        Infixes
            WFK.vloekwoordinf.1.0.4.EBT.2023-05-03.xlsx
    Prefixes
            WFK.Germ.PRE.ADJ.1.0.4.EBT.2023-05-03.xlsx
            WFK.Germ.PRE.N.1.0.4.EBT.2023-05-03.xlsx
            WFK.Klas.PRE.ADJ.1.0.3.EBT.2023-05-03.xlsx
            WFK.Klas.PRE.N.1.0.4.EBT.2023-05-03.xlsx
    Suffixes
            WFK.ADJ.aard.erd.erik.1.0.3.EBT.2023-05-03.xlsx
            WFK.asie.SUF.1.0.5.EBT.2023-05-03.xlsx
            WFK.Dim-Adv.1.0.3.EBT.2023-05-03.xlsx
            WFK.iminutiewe.1.0.3.EBT.2023-05-03.xlsx
            WFK.sel.SUF.1.0.3.EBT.2023-05-03.xlsx
    Affixoids
      Prefixoids
            WFK.hond.perd.POID.1.0.5.EBT.2023-05-03.xlsx
            WFK.van.taboe.POID.1.0.6.EBT.2023-05-03.xlsx
      Suffixoids
            WFK.van.pers.SOID.1.0.6.EBT.2023-05-03.xlsx
            WFK.van.taboe.SOID.1.0.3.EBT.2023-05-03.xlsx
    Compounds
            WFK.Intensiewe.vorme.1.0.5.EBT.2023-05-03.xlsx
            WFK.kleurintensiewe.1.0.2.EBT.2023-05-03.xlsx
    Other
            WFK.redup.1.0.3.EBT.2023-05-03.xlsx
            WFK.rekurs.hipo.1.0.4.EBT.2023-05-03.xlsx

An extract from the dataset showing the structure of each tab of the xlsx-file:

    1. Extract from xlsx-file: WFK.vloekwoordinf.1.0.4.EBT.2023-05-03.xlsx
    
    VivA-KPO
    MEK	                N    Trefslae
    die = 15529096
    asse-fokken-blief   21          7
    befokkenslis        22          3
    ver-fokken-seker    22          4
    assefokkenblief     23          2
    assefokkenblieftog  23          2
    
     VivA-KPE
    MEK	                N    Trefslae
    die = 1209996
    asse-fokken-blief   16         24
    assefokkenblief     17         10
    verfokkenseker      17          7
    vir-fokken-seker    17          8
    fanfokkentasties    18          4

The first half of each tab includes MECs extracted from one corpus collection (VivA-KPO), 
while the second half includes MECs extracted from the other collection (VivA-KPE). The 
headings of each half include one for the constructions (MEC), one for the word frequency 
class values (N), and the actual frequency in the corpus collection (Trefslae). In each half 
the frequency of 'die' (Afrikaans 'the') is indicated right below the other headings. 
After 'die' the specific MECs found in the collection are listed in descending order from the 
lowest N / highest 'Trefslae' to the highest N / lowest 'Trefslae'.

The dataset is based on tokens extracted from the Comprehensive corpus (Viva-KPO) and Exclusive 
corpus (VivA-KPE) of the Virtual Institute for Afrikaans  (VivA) between 2020 and 2022.
Each token in the dataset occured in either the Comprehensive or Exclusive corpus, or both.
See: http://www.viva-afrikaans.org for more information on the corpora and the composition of 
the different collections.

Please contact the compiler for more information.

----------------------------------------------------------------------------------------------------------------------------------------------------------

2. License

These files are distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 
4.0 International licence (CC BY-NC-ND 4.0). 
Please read the terms of use carefully.

License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International 4.0
URL: https://creativecommons.org/licenses/by-nc-nd/4.0/

Attribute work to: Trollip, E.B. 2022. Afrikaans morphological evaluation constructions dataset. 
Available from https://repo.sadilar.org/handle/20.500.12185/566
----------------------------------------------------------------------------------------------------------------------------------------------------------