Creating Datasets
Last updated
Last updated
Before a dataset can be uploaded, the data needs to be cleaned and shaped into the correct format. As a rule, the more disaggregated the dataset, the better as this allows a single dataset to be (re)used for multiple indicators and also allows for multivariable analysis.
The system accepts files in csv, xls and xlsx formats. The file needs to adhere to a specific structure and ensure it always contains the following fields:
Inside the Geography
column, valid values are
country code (ZA)
province codes (GAU, LIM, WC, etc.)
Municipal Demarcation Board codes (CTP, WC024)
Ward IDs (10204020 for Stellenbosch Ward 20 in 2016 demarcation)
or the lower level numerical geography code (e.g. 160001, 175005, etc.)
In between the Geography
and the Count
columns are the fields. These could be Age
, Race
, Education level
, etc.. see example below:
Geography
Age
Race
Child Ever Born
Count
ZA
16
Black African
never given birth
1
ZA
16
Coloured
never given birth
2
ZA
19
Black African
Unspecified
5
Column name requirements:
Must be unique when all colummn names are converted to lower case
Must start with a letter
Can contain letters, numbers, and spaces
Once the dataset has been sourced and shaped, it is ready to upload.
Log into the backend administration section of the website and navigate to Datasets
and click Add
. Give the dataset file a meaningful name and select the applicable geographical boundaries and capture source information (this will be displayed to users to help them understand where the data came from). Proceed with uploading the file from your machine.
Uploading the file kicks off a background task to process the file.
The system will alert you once this is complete. You may also check on the status of the job by viewing the queue Django Q > queued tasks
.
Once the file has been processed, you can proceed with creating indicators.
Datasets can be marked Public
or Private
.
Public datasets, and variables derived from them, can be used on any profile in Wazimap. This enables reuse of valuable datasets without the need for each case to source and upload the data.
Private datasets and variables derived from them can only be used on the profile they belong to.
Qualitative datasets can be uploaded and used to create qualitative indicators. A qualitative dataset must describe the relevant geography and provide the content. This content can be plaintext or HTML, see the example below on how the dataset should be laid out.
Geography
Content
EC
This is qualitative data
WC
<p>This is qualitative data</p>
In the content type dropdown menu select "Qualitative". Then continue to create a variable in the same way as you would for a quantitative dataset.