Zero-values vs missing data

It is important that users can tell whether a value in an indicator is zero, or missing.

Presenting the gap in the data is often just as helpful or important as presenting the data itself. It would also often misrepresent the facts to present a gap in the data as if it is zero.

On the other hand, it can be very inefficient to try to express every possible zero in a dataset. The simplest way to express zeros in a dataset is to simply have a row for every possible combination of attributes for each geographic area represented in the dataset. That would result in incredibly large datasets, where the value (Count column) of most of the rows would often tend to be zero.

Wazimap tries to support minimally-sized datasets by making some assumptions about the data, while also trying to support the presentation of missing data.

Current behaviour

Cases presented as missing data

Wazimap presents a value for a geographic area as "missing" when there are no rows of data for that geographic area in the dataset backing an indicator. As soon as there is one or more datum for a geography in an indicator, all subindicator and filter combinations will be presented as zero instead of missing. See below.

On a choropleth plotting hostpital beds per 1,000 people in countries in Africa, it would be wrong to plot countries with missing data as having zero beds.

Cases not explicitly presented as zero or missing

No data available for a given subindicator for the selected geography

When a subindicator does not occur in the data for a given geographic area, it will be excluded from the chart.

For example, when years are missing from an indicator on misspending, those years are not shown for that geography.

This behaviour is important in instances where an subindicator group can have a very large variety of different values, and only a small number are applicable to a specific geographic area. For example, election results showing the votes received by a party should only show the parties that contested that geographic area. If subindicators are included that did not contest that area, the chart would include hundreds of irrelevant items.

Some rows available for a subindicator for the selected geography, but not for every combination of filters

When a dataset does not contain explicit zero rows for a certain combination of subindicators, and a filter is applied excluding the available data, the subindicators without data are excluded from the chart.

Cases presented as zero

When a row in the dataset has the Count value of zero, it is of course presented as zero.

In the data mapper, when there is some data row for a given geographic area in an indicator, every combination of filters would be presented as zero even if there was no row in the dataset for that combination of attribute values.

In the example below, the number of white Tshivenda speakers is explicitly presented as zero, even though there is no such row in the dataset. The data mapper assumes that the data for a geography is complete if there is some row for that geography - perhaps for a different subindicator, or for the selected subindicator but for a different combination of filters..

No supported yet

Due to the assumptions shown above made by Wazimap about your data, we don't currently support the following. If there is demand, we can consider adding support, perhaps by making the behaviour configurable per dataset or indicator. We can potentially also help you shape your datasets and indicators to achieve your objectives within the above behaviour.

Explicit missing data in rich data view

The rich data view currently just hides rows without data for a given subindicator or filter selection. It does not support showing a label on a chart axis with a blank space to make the lack of data for that indicator visually explicit.

Partial data in the data mapper

The data mapper currently does not support partial data for a geography. If an indicator has one datum for that geography, all combinations of subindicator and filters will show zeros for that geography where other values are not available.

In these cases, it can be helpful to indicate in the description of your dataset that the data may not be complete, and when it was last updated.

As a workaround, you can show gaps in an entire subindicator by separating a dataset into a dataset and indicator per subindicator. The indicators where no values are available for a geography will then present that geography as blank (grey).

Last updated