5.2 How to turn numbers into stories
Last updated
Last updated
When mining data for stories, it is important to think about data as a journalist, not an analyst. The process of engaging with data from a journalistic perspective is known as interviewing the data.
“What has changed… is the explosion in data sources readily available on the web, which can both aid in telling important and necessary stories, but can also be easily misunderstood and potentially manipulated.
It’s more important than ever for citizens to develop new skills to use these data sources effectively. And just as that technology can aid us in finding the data, it can also allow us to share data with readers and build a community of curious people looking to explore the sources with us. There are so many stories buried in the details - more than we could ever hope to find on our OWN.”
Melissa Bell, vox.com
Correlation does not equal causation!
e.g. a correlation between gender and salary that shows a correlation between women and lower salaries, does not mean that women earn less money because they are women.
What other factors could have a contributing effect? Experience Field (science, sports, IT) Academic qualifications Age, etc.
If you identify correlation between variables, consult a domain expert for an explanation.
Having a clear question at the start of the whole process, helps ensure you don’t lose your focus along the way.
The pyramid begins with a large amount of information which becomes increasingly focused as you drill down into it, until you reach the point of communicating the results.
Step 1 - Compile
Data-driven storytelling “begins in one of two ways: either you have a question that needs data, or a dataset that needs questioning. Whichever it is, the compilation of data is what defines it” -Paul Bradshaw
Compiling data might take the form of: Data supplied directly by an organisation Data found using advanced search techniques Data compiled by scraping hidden online databases Data compiled by converting documents into spreadsheet format Data obtained by pulling information from API’s Data collected through observation, surveys, crowdsourcing etc
The most important stage
Everything else rests on this
The stage returned to the most
At each subsequent stage, you may find that you need to compile more information
Step 2 - Clean
Being confident in the stories hidden within the data means being able to trust the quality of the data – and that means cleaning it.
There are 2 basic types of data cleaning: Removing human errors Converting data into the format that you require
Step 3 - Context
So like any source of a story, you need to ask questions of the source: who gathered it? when? for what purpose? how was it gathered?
Data can’t always be trusted!
If you began this process with a clear question, you will ensure you don’t lose focus at this point
Step 4 - Combine
You may often need to combine different sources of data using a common field like a unique identifier (ward key/code), a name (municipality), or classically using location.
Combining different datasets will enrich the data, and provide additional information to work with. It also provides additional sources of information.
Step 5 - Communicate
What is your narrative?
What visualisations will you use?
What case studies can you use?
Don’t forget the storytelling!
Give your story a human face, or a name: personal relevance