Wazimap profile curation handbook
  • Start Here
    • Introduction
  • Point Mapper
    • What is Point Mapper?
    • Shaping Data for Point Collections
    • Uploading Point Collections
    • Creating Themes for Profile Collections
    • Creating Profile Collections from Point Collections
    • Uploading additional points to an existing Point Collection
    • Editing existing Point Data in Django
    • Bulk updates to an existing point collection
    • Navigating Point Mapper
  • Profile Admin
    • Creating Datasets
    • Sub-Indicator groups (columns)
    • Creating Universes
    • Creating Variables
    • Creating Point Collections
    • Creating a Profile Highlight
    • Creating Profile Indicators
    • Creating a Profile Key Metric
    • Managing Categories and Sub-Categories
    • Managing Point Themes and Profile Collections
    • Profile configuration options
  • Curation Concepts
    • Geography Codes
    • Zero-values vs missing data
    • Glossary
  • Common practices
    • General
    • SANEF election dashboard
    • Africa Data Hub
    • Data handling tips
  • Promotion and usage
    • Analytics
Powered by GitBook
On this page
  • Adding Unique Identifiers
  • Checking for Duplicates & Removing them
  • Merging Updates

Was this helpful?

Export as PDF
  1. Point Mapper

Bulk updates to an existing point collection

PreviousEditing existing Point Data in DjangoNextNavigating Point Mapper

Last updated 2 years ago

Was this helpful?

Long term maintenance to a point collection often involves the following tasks:

  1. Removing points that are no longer relevant;

  2. Adding points that are not already in the database; and

  3. Updating data about points already in the database (e.g., opening hours, services provided, correcting/improving a description).

To do this, you need to be able to compare your updated data to the data you have already uploaded to a profile collection in Wazimap NG. The easiest way to do this is to use unique identifiers.

Adding Unique Identifiers

An identifier is a value which uniquely and consistently identifies an object — in this case, a point in your point collection.

It is important to include some kind of identifier in your point data to facilitate updates to the data. Users down the line will rely on your identifiers being unique and consistent to be able to incorporate updates should they download a copy of your point data.

If your data does not have an official identifier that is consistent over time, and unique per point, it will become difficult to check for any duplicates or whether newly-provided points are already in your database (e.g., by name, address, and so on). Think about what your users will be able to provide you, and include that in what is shown to them as well.

You can use the following formula in Google Sheets to create a UUID (Copy this into the appropriate cell):

=CONCATENATE(DEC2HEX(RANDBETWEEN(0;4294967295);8);"-";DEC2HEX(RANDBETWEEN(0;42949);4);"-";DEC2HEX(RANDBETWEEN(0;42949);4);"-";DEC2HEX(RANDBETWEEN(0;42949);4);"-";DEC2HEX(RANDBETWEEN(0;4294967295);8);DEC2HEX(RANDBETWEEN(0;42949);4))

Make sure to copy and paste as text as well (not the formula), into a new column, and use the text version of it going forward. You don't want it to calculate a new random UUID for existing points.

Checking for Duplicates & Removing them

Google Sheets has a built-in function to check for and remove duplicates. In Google Sheets, select the data of interest, and click on Data > Data clean-up > Remove Duplicates (see Figure 20, below). In the box that appears (see Figure 21, below), be sure to select the option Data has header row if it indeed does.

  1. Add a deduplication step.

  2. Select a column whose values probably ought to be unique.

  3. Look for rows where the duplicate number is greater than 1.

Merging Updates

If you have a consistent UUID (unique identifier), then:

If you are using Excel, see . If you are using Workbench, follow the steps below:

In Excel or Google Sheets, use .

In Workbench, might work.

If you have messy or dirty data with different capitalisation and potential spelling mistakes, try using with CSV reconciliation.

conditionally format duplicate values
VLOOKUP
Join Tabs
OpenRefine
Figure 20: Checking for and removing duplicates in Google Sheets
Figure 21: Selecting ‘Data has header row’