1 of 35

Wazimap profile curation handbook

Start Here

Introduction

Welcome to the Profile Curation Handbook — the administrative guide for the all new Wazimap NG!

Wazimap NG (Next Generation) provides easy access to different kinds of data through a mapped and open-source information system, designed to help non-technical users explore data, both meaningfully and in context. In other words, a Geo-Information Spatial (GIS) Tool “for the rest of us”.

This Profile Curation Handbook contains documentation on how to manage and upload datasets, as well as create and manage profiles and their associated indicators. There are two administration roles:

Data Administrator — responsible for sourcing, shaping, and uploading datasets and point collections to Wazimap NG.
Profile Administrator — responsible for defining and managing profile indicators, and site overall content. In short, responsible for how data looks on Wazimap NG.

Both these roles have a function in Wazimap NG’s three views. It is, however, possible for these roles to be performed by the same person. The three views are Point Mapper, Data Mapper, and Rich Data. The table below shows the respective logos for each view of Wazimap NG.

Logo

Name

Point Mapper

Data Mapper

Rich Data

The rest of this Profile Curation Handbook is structured according to these three views of Wazimap NG, and the various roles that Data and Profile Administrators play for each.

Point Mapper

What is Point Mapper?

Point Mapper allows for a number of locations (called point collections) to be added and viewed on a map using Wazimap NG.

Figure 1, below, is an example of such point mapping, and shows the location of Water Treatment Works within the City of Cape Town Metro of South Africa.

The first step towards creating such a map is sourcing and understanding your data, and knowing what you are trying to display using your data. This is often the hardest part, and requires collaboration between Data and Profile Administrators. It is also important to know a dataset’s Terms of Use or Licence, and what this allows you to use the data for.

Wazimap NG uses .csv files to display point collections, and these .csv files require a particular shaping (formatting) to be properly understood by the Wazimap NG platform. This is the first step towards creating your map.

Shaping Data for Point Collections

The easiest tool to create or edit .csv files is Google Sheets — it can open both .csv and .xlsx files, and can export .csv files for upload to Wazimap NG. Any point collection requires the following three fields (as highlighted in red in Figure 2, below):

name;
longitude; and
latitude.

The name field (column) of your point collection cannot be formatted as a number. It may be necessary to format the entire column to plain text by selecting the entire column and clicking Format > Number > Plain text (see Figure 3, below).

Additionally, column headings cannot start with a number (e.g., 2019 HDI), contain periods (e.g., SDG 3.1), or contain square brackets (e.g., HDI over the Years [Average]).

No cell within the first row below the column headers can be left blank either — if no attribute exists in these cells, simply add no data. A blank cell will result in that attribute not being displayed for any points (even if they have a value).

Once your point collection is complete (has the three required fields, is formatted correctly, and contains no blank cells in the first row), it can be exported as a .csv file. To do this, click File > Download > Comma-separated values (.csv) (see Figure 4, below).

Take note of where this .csv file downloads to — you will need to locate it for upload to Wazimap NG later.

Uploading Point Collections

On the following page, do the following (refer to Figure 6, below):

Select (from the dropdown list) the Profile which should be associated with your point collection;
Give your point collection a Name;
Select the appropriate Permission type (Public or Private);
Import your point collection by clicking Choose file and selecting the .csv file you exported earlier;
Add the Source information; and
Scroll down and click Save in the bottom right hand corner.

Public & Private Datasets

Datasets can be set as Public or Private under Permission type. Public datasets, and variables derived from them, can be used on any profile in Wazimap NG. This enables reuse of valuable datasets without the need for each use case to source, and upload the data again. Private datasets and variables derived from them can only be used on the profile they are assigned to under Profile.

---

Your point collection (datatset) should now be added to Django. To display your point collection on Wazimap NG, you must first create a Theme, and then a Profile Collection (from your point collection), which will be nested under your Theme. Themes and Profile Collections are vital in displaying data using the Point Mapper on Wazimap NG.

Creating Themes for Profile Collections

To create a Theme, scroll down to the POINTS section again, and this time click on the +Add button next to Themes (see Figure 7, below).

On the following page, do the following (refer to Figure 8, below):

Select (from the dropdown list) the Profile which should be associated with your Theme;
Give your Theme an appropriate Name (this will be displayed on Wazimap NG);
Select a preferred Icon; and
Scroll down and click Save in the bottom right hand corner.

Creating Profile Collections from Point Collections

To create a Profile Collection, scroll down to the POINTS section again, and this time click on the +Add button next to Profile Collections (see Figure 10, below).

On the following page, do the following (refer to Figure 11, below):

Select (from the dropdown list) the Profile which should be associated with your Profile Collection;
Select (from the dropdown list) the Theme you created;
Select (from the dropdown list) the Collection (point collection) you uploaded;
Give your Profile Collection an appropriate Label (this will be displayed on Wazimap NG); and
Scroll down and click Save in the bottom right hand corner.

Currently the Icon and Colour functionalities are not working. These fields can be left empty.

Adding Filters to Profile Collections

Any values in the filterable_fields array that are not attributes of your Point Collection will be ignored.

Adding HTML Field Types

In addition to adding filters, you can define the field type of columns in your Profile Collection to render data according to HTML code. This is particularly useful for linking to additional, external information (e.g., linking to the source of your points).

The HTML code for adding links is:

<a href='url' target='_blank'>name</a>

In addition to the href HTML attribute, the following are also allowed: class, target, data-*, and style.

---

If all steps were followed correctly, your point collection will now display on your Wazimap NG Profile (sometimes a hard refresh of the Wazimap NG page is required for changes to reflect). The sections that follow offer some additional information for displaying point collections on your Wazimap NG profile.

Uploading additional points to an existing Point Collection

Next, in Django, scroll down to the POINTS section, and this time click on Collections (see Figure 15, below).

The page that opens will contain all point collections for all Wazimap NG profiles. There are two ways to locate an existing point collection (see Figure 16, below):

If you know the point collection’s name, simply enter it in the Search bar (top left), and click Search; or
Filter point collections by the associated Profile (top right), and locate the desired point collection in the list.

Editing existing Point Data in Django

It is possible to edit individual points within an existing point collection in Django. To do so, scroll down to the POINTS section, and this time click on Locations (see Figure 17, below).

The page that opens will contain all points for all Wazimap NG profiles. Points can be located in the same way as point collections:

If you know the point’s name, simply enter it in the Search bar (top left), and click Search;
Filter points by the associated Profile (top right), and locate the desired point in the list; or
Filter points by the point collection name.

Once located, click on the point’s name. On the page that opens, it is possible to edit a point’s name (see Figure 18, below), and its attributes (see Figure 19, below). Attributes appear in code format.

Referring to Figure 16, for each attribute associated with a point, there is a key and a value. A key corresponds to the column headings in the originally uploaded .csv file, and value to the cells in the respective column. For this reason, it is advisable NOT TO EDIT a key, but only a value.

Bulk updates to an existing point collection

Long term maintenance to a point collection often involves the following tasks:

Removing points that are no longer relevant;
Adding points that are not already in the database; and
Updating data about points already in the database (e.g., opening hours, services provided, correcting/improving a description).

To do this, you need to be able to compare your updated data to the data you have already uploaded to a profile collection in Wazimap NG. The easiest way to do this is to use unique identifiers.

Adding Unique Identifiers

An identifier is a value which uniquely and consistently identifies an object — in this case, a point in your point collection.

It is important to include some kind of identifier in your point data to facilitate updates to the data. Users down the line will rely on your identifiers being unique and consistent to be able to incorporate updates should they download a copy of your point data.

If your data does not have an official identifier that is consistent over time, and unique per point, it will become difficult to check for any duplicates or whether newly-provided points are already in your database (e.g., by name, address, and so on). Think about what your users will be able to provide you, and include that in what is shown to them as well.

You can use the following formula in Google Sheets to create a UUID (Copy this into the appropriate cell):

=CONCATENATE(DEC2HEX(RANDBETWEEN(0;4294967295);8);"-";DEC2HEX(RANDBETWEEN(0;42949);4);"-";DEC2HEX(RANDBETWEEN(0;42949);4);"-";DEC2HEX(RANDBETWEEN(0;42949);4);"-";DEC2HEX(RANDBETWEEN(0;4294967295);8);DEC2HEX(RANDBETWEEN(0;42949);4))

Make sure to copy and paste as text as well (not the formula), into a new column, and use the text version of it going forward. You don't want it to calculate a new random UUID for existing points.

Checking for Duplicates & Removing them

Google Sheets has a built-in function to check for and remove duplicates. In Google Sheets, select the data of interest, and click on Data > Data clean-up > Remove Duplicates (see Figure 20, below). In the box that appears (see Figure 21, below), be sure to select the option Data has header row if it indeed does.

Add a deduplication step.
Select a column whose values probably ought to be unique.
Look for rows where the duplicate number is greater than 1.

Merging Updates

If you have a consistent UUID (unique identifier), then:

Navigating Point Mapper

Video to be added

Profile Admin

Creating Datasets

Preparing the dataset

Before a dataset can be uploaded, the data needs to be cleaned and shaped into the correct format. As a rule, the more disaggregated the dataset, the better as this allows a single dataset to be (re)used for multiple indicators and also allows for multivariable analysis.

The system accepts files in csv, xls and xlsx formats. The file needs to adhere to a specific structure and ensure it always contains the following fields:

Geography,Count

Inside the Geography column, valid values are

country code (ZA)
province codes (GAU, LIM, WC, etc.)
Municipal Demarcation Board codes (CTP, WC024)
Ward IDs (10204020 for Stellenbosch Ward 20 in 2016 demarcation)
or the lower level numerical geography code (e.g. 160001, 175005, etc.)

In between the Geography and the Count columns are the fields. These could be Age, Race, Education level, etc.. see example below:

Geography

Age

Race

Child Ever Born

Count

Black African

never given birth

Coloured

never given birth

Black African

Unspecified

Column name requirements:

Must be unique when all colummn names are converted to lower case
Must start with a letter
Can contain letters, numbers, and spaces

Once the dataset has been sourced and shaped, it is ready to upload.

Uploading the dataset

Log into the backend administration section of the website and navigate to Datasets and click Add. Give the dataset file a meaningful name and select the applicable geographical boundaries and capture source information (this will be displayed to users to help them understand where the data came from). Proceed with uploading the file from your machine.

Uploading the file kicks off a background task to process the file.

The system will alert you once this is complete. You may also check on the status of the job by viewing the queue Django Q > queued tasks.

Datasets can be marked Public or Private.

Public datasets, and variables derived from them, can be used on any profile in Wazimap. This enables reuse of valuable datasets without the need for each case to source and upload the data.

Private datasets and variables derived from them can only be used on the profile they belong to.

Qualitative datasets

Qualitative datasets can be uploaded and used to create qualitative indicators. A qualitative dataset must describe the relevant geography and provide the content. This content can be plaintext or HTML, see the example below on how the dataset should be laid out.

Geography

Content

This is qualitative data

<p>This is qualitative data</p>

Sub-Indicator groups (columns)

Reordering subindicators

It might be that you want to change the order in which the sub-indicators are shown. For example, you may want to swap Agree and Disagree in the chart below

In the Admin Suite, find SubindicatorsGroups

Find the relevant indicator:

Then drag the subindicators into the desired order:

The order should now be changed on the front-end. In order to see it you will need to hard-refresh your browser - ctrl + shift + R (this will be fixed soon).

Non-aggregatable columns

Values are summed over all dimensions other than the indicator variable by default. That means an indicator on the column "financial year" from a dataset with columns "financial year" and "income source" will disaggregate by financial year, and show the sum of the different income sources, unless the user adds a filter on income source.

It sometimes doesn't make sense to sum values over a dimension. For example

If you don't know how many years are in a dataset, the total across years doesn't mean anything
If your data contains overlapping categories to support differing standards, e.g. ages 15-24 and 15-35, summing over these would lead to double-counting.

You can mark a column as non-aggregatable by un-checking the Can aggregate check-box. It is checked by default as most columns are fine to aggregate over.

This will mean that indicators from this dataset will automatically have a filter for this subindicator group, unless this group is used as the indicator variable (in which case it is already disaggregated).

Unlike user-added filters, filters added to disaggregate non-aggregatable columns can not be removed.

Creating Universes

A universe refers to the population against which the variable is being applied. Universe can also be left as blank which would then apply to everyone.

To create universes, you will need to write a bit of json.

The structure of this is:

See below for an example of youth age range universe.

This universe when applied to a variable, would include all people within the ages of 15 to 35.

A few notes:

It doesn't need to be an array - a single value will also work (e.g. {"Gender": "Female"}
It is case sensitive to remember to match the case in the file
You can have multiple filters

Creating Variables

Variables are data points and can also be grouped for aggregate fields. These form the basis for the Profile Administrator to create Profile Indicators from.

Create new variable(s)

To create a new variable first ensure that the dataset file is uploaded and that it has been processed on the system. If this is the case, then proceed to Variables in the admin system and select Add to create a new one.

First, select the dataset this variable is found in, from the dropdown list and continue by clicking the Save and continue editing button.
Select which field(s) to group the variable by - you will notice that these are the data columns in your dataset file. These become sub-indicators. You will typically want one or perhaps two of these.
Give your variable a meaningful name.
Click Save .

Variables are extracted as a background process and you will be alerted once they are complete.

Repeat this for as many variables as you need to create from your dataset and repeat for all your dataset files you have uploaded.

Once the variables have been created, the Profile Administrator can now proceed with creating and configuring the rest of the site.

Creating Point Collections

Point collections refer to a number a different locations (points on a map) and are typically grouped by a specific subject matter. For example a dataset of schools in South Africa.

Collections are created by uploading a csv file and associating the file with a theme and sub-theme.

Preparing the points dataset

Prepare a csv file containing at least the following fields: Name, longitude, latitude. You can see an example below:

Name,longitude,latitude
BOTSHABELO,26.7160600000,-29.2362000000
KHUBUSIDRIFT,27.6238400000,-32.5681900000
STUTTERHEIM,27.4274100000,-32.5711800000
MOTHERWELL,25.5841900000,-33.7966400000
KWADWESI,25.5234700000,-33.8410900000
CRADOCK,25.6257100000,-32.1756000000

In addition to these fields, various other fields can also be included as attributes for the point. These are shown on the point tooltip. An example of a file with additional attributes shown below:

Name,Ward,Phase,Sector,Ward_ID,District,Province,Unnamed: 0,SpecialNeed,Municipality,StreetAddress,latitude,longitude
Sol Plaatjie Full Service School,,PRIMARY SCHOOL,PUBLIC,.,,,0,No,,.,0.0,0.0
THUSEGO INTERMEDIATE SCHOOL,34501005.0,COMBINED SCHOOL,PUBLIC,34501003,DC45,NC,1,No,NC451,"TSINENG VILLAGE, TSINENG, MOTHIBISTAD, 8460",-27.0953,23.0884
!XUNKHWESA COMBINED SCHOOL,30901030.0,SECONDARY SCHOOL,PUBLIC,30901027,DC9,NC,2,No,NC091,"SCHMIDTSDRIFT NEDERSETTING, PLATFONTEIN, KIMBERLEY, 8300",-28.70804,24.65462
BANKSDRIF SECONDARY SCHOOL,30904005.0,SECONDARY SCHOOL,PUBLIC,30904009,DC9,NC,3,No,NC094,"WATER AFFAIRS MEN'S HOSTEL, , HARTSWATER, 8570",-27.7194,24.8092
BARKLY WES PRIMÊRE SKOOL,30902002.0,PRIMARY SCHOOL,PUBLIC,30902002,DC9,NC,4,No,NC092,"IRIS STREET, DEBEERSHOOGTE, BARKLY WEST, 8375",-28.5236,24.5093
BARKLY WEST HIGH SCHOOL,30902002.0,COMBINED SCHOOL,PUBLIC,30902002,DC9,NC,5,No,NC092,"DAHLIA STREET, DEBEERSHOOGTE, BARKLY WEST, 8375",-28.5262,24.5096
BARKLY WEST PRIMARY SCHOOL,30902001.0,PRIMARY SCHOOL,PUBLIC,30902001,DC9,NC,6,No,NC092,"2099 MAKENA STREET, MATALENG, BARKLY WEST, 8375",-28.535,24.5007
BEACON PRIMARY SCHOOL,30901028.0,PRIMARY SCHOOL,PUBLIC,30901028,DC9,NC,7,No,NC091,"VERA STREET, COLVILLE, KIMBERLEY, 8300",-28.7102,24.7549
BOITSHOKO PRIMARY SCHOOL,30901013.0,PRIMARY SCHOOL,PUBLIC,30901013,DC9,NC,8,No,NC091,"179 SESING STREET, GALESHEWE, KIMBERLEY, 8345",-28.7186,24.7433
BONTLENG PRIMARY SCHOOL,,PRIMARY SCHOOL,PUBLIC,.,,,9,No,,"1716 BOJOSI STREET, , PAMPIERSTAD, 8566",0.0,0.0

Include the best identifier or identifying information

It's a good idea to include an standard identifiers for points in your dataset. This can make future updates and cross-referencing much easier. For example, for official facilities like public schools in South Africa, use their EMIS number.

Format of the fields

The "Name" column of your point collection cannot be a number data type. It may be necessary to convert the column to a text/string format before uploading. This can be done easily in Microsoft Excel and is illustrated in the figure below.

NOTE: It may sometimes be more suitable to change which column is given the "Name" heading as this is what is displayed in the tooltip on the map. In the screenshot below, the "Description" column would be a more appropriate choice to be made the "Name" column.

Uploading points and assigning to themes

Once the file is ready to be uploaded, navigate to Point Collections and click Add which will allow you to name the collection and upload the file.

Select the profile which should be associated with these points and select the theme they belong to. Provide a label and the source of the data and proceed to upload.

Uploading additional Points data

Uploads to existing point collections add data without replacing existing points.

To add additional points to a point collection, prepare the file with the new points to be added, as outlined in the Preparing the points dataset section.

Select the point collection you would like to update, upload the file with the new additional point data, and proceed to save the changes.

Editing Points data

You can edit information for a specific point by selecting it under Points > Locations.

To rectify/revise/update previously updated data in bulk, delete the existing point collection and recreate the adjusted file containing all the data.

Bulk updates to a points dataset

Long term maintenance to a point dataset often involves the following tasks:

removing points that are no longer relevant
adding points that are not already in the database
updating data about points already in the database (e.g. opening hours, services provided, correcting/improving a description)

To be able to do this, you need to be able to compare your updated data to the data you have already uploaded to a collection in Wazimap.

To match incoming data to your existing data, you need an identifier. Failing that, you will need to use whatever other identifying information you have available.

Identifiers

An identifier is a value which uniquely and consistently identifies an object - in this case, a point in your point collection.

It is important to include some kind of identifier in your point data to facilitate updates to the data. Downstream users may also need to be able to rely on your identifiers being unique and consistent to be able to incorporate your updates if they keep a copy of your point data.

If your data does not have an official identifier that is consistent over time, and unique per point, think about how you will check if you have any duplicates and check whether newly-provided points are already in your database - e.g. by name, address, and so on. Think about what your users will be able to provide you, and include that in what is shown to them as well.

Consider making up and maintaining your own unique consistent identifier. We suggest using UUIDs because they are globally unique, and not just unique within one table. This is important in case you want to combine tables later, e.g. if you want to merge "public schools" and "private schools" into one "schools" table and continue to use your unique identifier.

You can use the following formula in excel to create a UUID:

=CONCATENATE(
    DEC2HEX(RANDBETWEEN(0;4294967295);8);"-";
    DEC2HEX(RANDBETWEEN(0;42949);4);"-";
    DEC2HEX(RANDBETWEEN(0;42949);4);"-";
    DEC2HEX(RANDBETWEEN(0;42949);4);"-";
    DEC2HEX(RANDBETWEEN(0;4294967295);8);
    DEC2HEX(RANDBETWEEN(0;42949);4)
)

When filled down a column, these identifiers will look like

Make sure to copy and paste as text (not the formula) into a new column and use the text version of it going forward. You don't want it to calculate a new random UUID for existing points.

How to check for duplicates

In Workbench

add a deduplication step
select a column whose values probably ought to be unique
look for rows where duplicate number is greater than 1

How to merge in updates

If you have a consistent unique identifier:

In Excel, use VLOOKUP()
In Workbench, perhaps Join Tabs might work

If you have messy data with different capitalisation and potential spelling mistakes:

Try using OpenRefine with CSV reconcilliation

How to check for new records to import

How to check for stale records

Creating a Profile Highlight

Here are the steps to create a Profile Highlight (as shown in the image below):

STEP 3: Under the Profile Menu (in Django), select Add New Profile Highlight, and it will redirect you to the page, as show below.

STEP4: Select your Profile from the drop-down list.

STEP 5: Select the Variable you previously created from your dataset.

STEP 6: Add a label or title for the Profile Highlight.

STEP 7: Select the Sub-Indicator to be displayed.

STEP 8: Under Denominator, select Absolute Value.

STEP 9: Scroll down, hit Save, and refresh the Wazi Profile. Your Profile Highlight should now show the value for the selected geography.

Creating Profile Indicators

When to use

A profile indicator should be used when you want to display an indicator to the user. A profile indicator is presented as a bar chart in the rich data view and is also available under the data mapper.

New Profile Indicators

To create new profile indicators, log into the backend admin section with your admin account and click Add next to Profile indicators.
Select the profile to associate this indicator with (there might only be a single one).
Select the variable on which this is based, from the dropdown list.
For the content type dropdown, if the indicator is based on qualitative data select HTML, otherwise leave it as Indicator for a chart type indicator.
Provide a meaningful name for this profile indicator in the label field - this is what users will see
Select the category and sub-category this indicator will be housed under (shown in both the Data Mapper and the Rich Data view).
Add a textual description of the indicator - this will be shown in the Rich Data view just below the relevant graph.
Select the choropleth method to be used - either sibling or sub-indicator. This determines what to use as the denominator when rendering the choropleth. Sibling level would be the sum of the same geography types (e.g. when viewing number of households in WC this would compare WC to the sum of all provinces). Sub-indicator method would tally up the values for the children of the current geography level (e.g. households in WC would then tally up households for all districts in WC)
Sub-indicators will be shown to you once you had initially saved. Sub-indicators can be reordered by dragging and dropping them.
Confirm the indicator is now visible on the frontend by doing a hard refresh (ctrl-shift-r)

Display configuration

Creating a Profile Key Metric

This page is a stub - please add documentation

Managing Categories and Sub-Categories

Categories can be created and managed by navigating to Indicator Categories in the admin site.

Provide a name for the category
Select the profile to be associated with this category (default is Youth Explorer but there might be others in the future)
Provide descriptive text explaining what the category contains and any other details relevant to the users. This is displayed in the Rich Data View.

Ordering

Categories, subcategories, and profile indicators can be reordered by dragging the handle in the ordering column.

The ordering handle is only available when sorted by that column.

Managing Point Themes and Profile Collections

Point themes and Profile Collections are vital in displaying data using the Point Mapper on Wazimap. Themes are similar to the Indicator Categories in the Data Mapper while Profile Collections are similar to the Profile Indicators of the Data Mapper.

Themes

Themes can be created and managed by navigating to Themesunder the Pointssection of the Wazimap-NG admin page.

Select which profile you want to create the theme for
Provide a name for the theme
Select an appropriate icon to display

Theme colour is currently fixed according to the order of themes:

Theme colour will be more configurable in a future update. Please let us know what your needs are.

Profile Collections

Profile Collections can be created and managed by navigating to Themesunder the Pointssection of the Wazimap-NG admin page.

Select which profile you would like the Profile Collection to be associated with
Select which Theme you would like the new Profile Collection to fall under
Select the Point Collection that has the data you would like to represented by the Profile Collection
Decide on the Label for the Profile Collection (this will be the text that is displayed in the front-end)

NOTE: Currently the Profile Collection Icon and Colour functionality is not working. These fields can be left empty.

Profile configuration options

Profile configuration is generally carried out by Wazimap Support.

Please email support@wazimap.co.za to request changes.

Profile access permissions

Public profiles can be viewed by anyone on the internet.

Private profiles can only be viewed by users who are assigned permission for that profile. Users have to login to view that profile.

Curation Concepts

Geography Codes

Below are the Geography Codes that should be used for .csv files uploaded to Wazimap Profiles. Be sure to use the Geography Codes corresponding to your Wazimap Profile's Geography Hierarchy.

Geography Hierarchy = World

Africa

Algeria

DZA

Angola

AGO

Benin

BEN

Botswana

BWA

Burkina Faso

BFA

Burundi

BDI

Cape Verde

CPV

Cameroon

CMR

Central African Republic

CAF

Chad

TCD

Comoros

COM

Congo

COG

Djibouti

DJI

Democratic Republic of Congo

COD

Egypt

EGY

Equatorial Guinea

GNQ

Eritrea

ERI

eSwatini

SWZ

Ethiopia

ETH

Gabon

GAB

Gambia

GMB

Ghana

GHA

Guinea

GIN

Guinea-Bissau

GNB

Ivory Coast

CIV

Kenya

KEN

Lesotho

LSO

Liberia

LBR

Libya

LBY

Madagascar

MDG

Malawi

MWI

Mali

MLI

Mauritania

MRT

Mauritius

MUS

Mayotte

MYT

Morocco

MAR

Mozambique

MOZ

Namibia

NAM

Niger

NER

Nigeria

NGA

Rwanda

RWA

Réunion

REU

Saint Helena

SHN

Sao Tome and Principe

STP

Senegal

SEN

Seychelles

SYC

Sierra Leone

SLE

Somalia

SOM

South Africa

ZAF

South Sudan

SSD

Sudan

SDN

Tanzania

TZA

Togo

TGO

Tunisia

TUN

Uganda

UGA

Western Sahara

ESH

Zambia

ZMB

Zimbabwe

ZWE

Zero-values vs missing data

It is important that users can tell whether a value in an indicator is zero, or missing.

Presenting the gap in the data is often just as helpful or important as presenting the data itself. It would also often misrepresent the facts to present a gap in the data as if it is zero.

On the other hand, it can be very inefficient to try to express every possible zero in a dataset. The simplest way to express zeros in a dataset is to simply have a row for every possible combination of attributes for each geographic area represented in the dataset. That would result in incredibly large datasets, where the value (Count column) of most of the rows would often tend to be zero.

Wazimap tries to support minimally-sized datasets by making some assumptions about the data, while also trying to support the presentation of missing data.

Current behaviour

Cases presented as missing data

On a choropleth plotting hostpital beds per 1,000 people in countries in Africa, it would be wrong to plot countries with missing data as having zero beds.

Cases not explicitly presented as zero or missing

No data available for a given subindicator for the selected geography

When a subindicator does not occur in the data for a given geographic area, it will be excluded from the chart.

For example, when years are missing from an indicator on misspending, those years are not shown for that geography.

This behaviour is important in instances where an subindicator group can have a very large variety of different values, and only a small number are applicable to a specific geographic area. For example, election results showing the votes received by a party should only show the parties that contested that geographic area. If subindicators are included that did not contest that area, the chart would include hundreds of irrelevant items.

Some rows available for a subindicator for the selected geography, but not for every combination of filters

When a dataset does not contain explicit zero rows for a certain combination of subindicators, and a filter is applied excluding the available data, the subindicators without data are excluded from the chart.

Cases presented as zero

When a row in the dataset has the Count value of zero, it is of course presented as zero.

In the data mapper, when there is some data row for a given geographic area in an indicator, every combination of filters would be presented as zero even if there was no row in the dataset for that combination of attribute values.

In the example below, the number of white Tshivenda speakers is explicitly presented as zero, even though there is no such row in the dataset. The data mapper assumes that the data for a geography is complete if there is some row for that geography - perhaps for a different subindicator, or for the selected subindicator but for a different combination of filters..

No supported yet

Due to the assumptions shown above made by Wazimap about your data, we don't currently support the following. If there is demand, we can consider adding support, perhaps by making the behaviour configurable per dataset or indicator. We can potentially also help you shape your datasets and indicators to achieve your objectives within the above behaviour.

Explicit missing data in rich data view

The rich data view currently just hides rows without data for a given subindicator or filter selection. It does not support showing a label on a chart axis with a blank space to make the lack of data for that indicator visually explicit.

Partial data in the data mapper

The data mapper currently does not support partial data for a geography. If an indicator has one datum for that geography, all combinations of subindicator and filters will show zeros for that geography where other values are not available.

In these cases, it can be helpful to indicate in the description of your dataset that the data may not be complete, and when it was last updated.

As a workaround, you can show gaps in an entire subindicator by separating a dataset into a dataset and indicator per subindicator. The indicators where no values are available for a geography will then present that geography as blank (grey).

Glossary

Dataset

Subindicator

In Wazimap a Subindicator is what we call an attribute value in one of the classifying columns or dimensions of a dataset.

In OLAP terms, a subindicator corresponds to a member of a dimension.

In statistics, a subindicator corresponds to categories of a categorical variable.

Subindicators are so named because they represent the choices offered when plotting a choropleth.

Subindicator Groups

A subindicator group represents the set of subindicators of a particular attribute or column in the original dataset.

In OLAP terms, a subindicator group corresponds to a dimension.

It corresponds directly to the columns in a dataset, other than the Geography and Count columns.

Variables

Variables are datapoints used to create profile indicators from and are created by the Data Administrator. Multiple variables can be created from the same dataset.

Most of the time, a variable simply exposes a subindicator group for use as categories in an indicator.

Variables exist for more complicated cases where the way percentages need to be calculated using a different population than simply the total of all subindicators. This is done by associating a universe to the variable.

Universe

Universe refers to the population to which the indicators are applied. There can be multiple universes if required and it can also be left blank to apply to the entire population.

Profile Indicator

Profile indicators are created by the Profile Administrator and are presented to the user on the website. Indicators belong to categories (e.g. Demographics) and can belong to sub-categories (e.g. General Population). In addition, they can also have sub-indicators (e.g. Age could take the individual age brackets as sub-indicators).

Key Metric

Key metrics are values of significance as decided upon by the Profile Administrator. These are used to showcase and callout highlighted values both in the rich data (profile) view, as well as on the map view. Key metrics can be shown as a percentage or absolute numbers as defined by the Profile Administrator.

Data Mapper

The Data Mapper provides an interface for users to plot indicators on the map. Only indicators available for plotting are shown and these might change depending on the geography level and that data available for that level. Please note that all indicators are shown in the Rich Data View.

Rich Data View

What was once referred to as a profile view on Wazimap is now the Rich Data View and provide charted exploration of the available data indicators. This view also reveals the source of each indicator along with a description for the categories and indicators (optional and set by Profile Administrators). This view also allows a chart to be downloaded (to be used elsewhere) and will soon allow for a chart to be embedded and for data to be downloaded. The Rich Data View also supports a print-friendly view allowing for easier sharing and dissemination.

The point menu houses point data themes and collections and allows for these to be overlaid on the map.

Point Data

Point data refers to coordinate based data rather than a dataset shaped within a geographical boundary. This allows for points to be overlaid on various other indicators.

Common practices

General

Sources

Label data sources as {{ dataset title }} - {{ organisation }}so that users can more easily find the right data source.

Link to the official page about the actual dataset, if one exists, otherwise to the homepage of the source organisation.
Prefer writing source names in full, rather than abbreviated. e.g. Our world in data rather than OWID
See if they have a preferred way to be cited and try to use that.

Number formatting

Percentage raw data

Select Value as the default between Value/Percentage presentation in the chart/table
Disable the Value/Percentage toggle

Ratios

Use "" as the format string - as in - don't use the SI unit formatting, because then 0.789 will look like 789m and people will think it means millions.

Decimals

If you're going to use format strings that format with SI units, ensure you configure no more than 2 decimal places or 3 significant digits, otherwise you could end up with numbers presented as 12.345k and people will misread the . as a thousand separator due to the three decimal places and think this is 12 345 000

SANEF election dashboard

Formatting standards for display purposes

Number of people

Ensure any data pertaining to people are rounded to the nearest whole number.
Do not use "M" and "k". Use full figures with thousands-separator.
Examples:
- data entry refers to 104678.9 people --> this should be displayed as 104,679.
- data entry refers to 109987789.49 people --> this should be displayed as 109,987,789.

Financial years

Full values

Prefer full financial years, as in 2016-2017 rather than 2017 or 2016-17. This is because

Most people don't know that in the municipal sphere (unlike national) 2017 means 2016-2017
Many people won't realise that the -17 means the next year - writing it in full is much easier to understand correctly the first time.

Subindicator ordering

Order financial years in subindicator groups in reverse chronological order - that is 2018-2019 and then 2017-2016. People are most often interested in the latest financial year.

Non-aggregatable

Always mark the financial year column as non-aggregatable. Summing over financial years usually doesn't make sense for an abritrary number of financial years and can easily lead to surprises for users.

Default filter

When the financial year column is not the variable, add a default filter to match the latest financial year. Marking financial year as non-aggregatable will already add a filter for it, but adding default filter configuration will ensure that a sensible value is selected in the filter.

Age bands

The reasoning for our preferred age bands is as follows:

Ideally bands should align with voting age
Ideally bands should align across datasets e.g. demographics should align with election data
perhaps 0-18, 18-19, 20-29, 30-39, 40-49, 50-59, 60+

Sources

Label data sources as {{ dataset title }} - {{ organisation }}so that users can more easily find the right data source.

as opposed to

Africa Data Hub

These are the steps needed to generated datasets for Africa Data Hub.

The script of ADH can be found at https://github.com/OpenUpSA/wazimap-adh-data under the folder COVID Countries and Africa Admin 1.

Collection data

Download owid-covid-latest.csv from https://github.com/owid/covid-19-data/tree/master/public/data

Download from https://github.com/dsfsi/covid19za/tree/master/data the following files: covid19za_provincial_cumulative_timeline_confirmed.csv, covid19za_provincial_cumulative_timeline_deaths.csv, covid19za_provincial_cumulative_timeline_recoveries.csv

Download from https://data.humdata.org/organization/hera-humanitarian-emergency-response-africa ALL( for various countries) csv files that end with Coronavirus (Covid-19) Subnational eg. Niger: Coronavirus (Covid-19) Subnational. All the datasets from this source should be saved in a folder named HERA

For each csv file in HERA folder, change columns name from

['CONTAMINES', 'DECES', 'GUERIS', 'CONTAMINES_FEMME', 'CONTAMINES_HOMME', 'CONTAMINES_GENRE_NON_SPECIFIE']

These dataset should be saved under the folder wazimap-adh-data-main/COVID Countries and Africa Admin 1 After downloading the datasets, run the script as in the following order

Admin0_dataTransformation_v2.ipynb

Africa_Admin1_dataTransformation.ipynb

Data_Aggregation.ipynb

Data_Aggregation.ipynb will generate cases_monthly, death_monthly .

Data handling tips

Phone numbers in CSVs

Cells and columns in CSV don't have well-defined types so programs reading those CSVs generally infer the type from the values.

This can be a problem when opening a file with phone numbers which often start with a zero, and look like a number to programs reading CSVs.

When reading a CSV file, see if you can specify that such columns should be read as Text rather than letting the program infer the type.

After reading, if the columns were read as text, the zero-prefixes will remain. If the program read it as numbers, the zero prefix will have been lost.

To check that the file was saved correctly, you can open it in a text editor like Notepad to check that the zeros are still there:

Wazimap profile curation handbook

Start Here

Introduction

Point Mapper

What is Point Mapper?

Shaping Data for Point Collections

Uploading Point Collections

Public & Private Datasets

Creating Themes for Profile Collections

Creating Profile Collections from Point Collections

Adding Filters to Profile Collections

Adding HTML Field Types

Uploading additional points to an existing Point Collection

Editing existing Point Data in Django

Bulk updates to an existing point collection

Adding Unique Identifiers

Checking for Duplicates & Removing them

Merging Updates

Navigating Point Mapper

Profile Admin

Creating Datasets

Preparing the dataset

Uploading the dataset

Dataset permissions and sharing

Qualitative datasets

Sub-Indicator groups (columns)

Reordering subindicators

Non-aggregatable columns

Creating Universes

Creating Variables

Create new variable(s)

Creating Point Collections

Preparing the points dataset

Format of the fields

Uploading points and assigning to themes

Uploading additional Points data

Editing Points data

Bulk updates to a points dataset

Identifiers

How to check for duplicates

How to merge in updates

How to check for new records to import

How to check for stale records

Creating a Profile Highlight

Creating Profile Indicators

When to use

New Profile Indicators

Display configuration

Creating a Profile Key Metric

Managing Categories and Sub-Categories

Ordering

Managing Point Themes and Profile Collections

Themes

Profile Collections

Profile configuration options

Profile access permissions

Curation Concepts

Geography Codes

Geography Hierarchy = World

Africa

Zero-values vs missing data

Current behaviour

Cases presented as missing data

Cases not explicitly presented as zero or missing

No data available for a given subindicator for the selected geography

Some rows available for a subindicator for the selected geography, but not for every combination of filters

Cases presented as zero

No supported yet

Explicit missing data in rich data view

Partial data in the data mapper

Glossary

Dataset

Subindicator

Subindicator Groups

Variables

Universe

Profile Indicator

Key Metric

Data Mapper

Rich Data View