Creating Datasets

Preparing the dataset

Before a dataset can be uploaded, the data needs to be cleaned and shaped into the correct format. As a rule, the more disaggregated the dataset, the better as this allows a single dataset to be (re)used for multiple indicators and also allows for multivariable analysis.

The system accepts files in csv, xls and xlsx formats. The file needs to adhere to a specific structure and ensure it always contains the following fields:

Geography,Count

Inside the Geography column, valid values are

  • country code (ZA)

  • province codes (GAU, LIM, WC, etc.)

  • Municipal Demarcation Board codes (CTP, WC024)

  • Ward IDs (10204020 for Stellenbosch Ward 20 in 2016 demarcation)

  • or the lower level numerical geography code (e.g. 160001, 175005, etc.)

In between the Geography and the Count columns are the fields. These could be Age, Race, Education level, etc.. see example below:

Column name requirements:

  • Must be unique when all colummn names are converted to lower case

  • Must start with a letter

  • Can contain letters, numbers, and spaces

Once the dataset has been sourced and shaped, it is ready to upload.

Uploading the dataset

Log into the backend administration section of the website and navigate to Datasets and click Add. Give the dataset file a meaningful name and select the applicable geographical boundaries and capture source information (this will be displayed to users to help them understand where the data came from). Proceed with uploading the file from your machine.

Uploading the file kicks off a background task to process the file.

The system will alert you once this is complete. You may also check on the status of the job by viewing the queue Django Q > queued tasks.

Once the file has been processed, you can proceed with creating indicators.

Dataset permissions and sharing

Datasets can be marked Public or Private.

Public datasets, and variables derived from them, can be used on any profile in Wazimap. This enables reuse of valuable datasets without the need for each case to source and upload the data.

Private datasets and variables derived from them can only be used on the profile they belong to.

Qualitative datasets

Qualitative datasets can be uploaded and used to create qualitative indicators. A qualitative dataset must describe the relevant geography and provide the content. This content can be plaintext or HTML, see the example below on how the dataset should be laid out.

In the content type dropdown menu select "Qualitative". Then continue to create a variable in the same way as you would for a quantitative dataset.

Last updated