LogoLogo
The Fundamentals of Data-driven Storytelling
The Fundamentals of Data-driven Storytelling
  • About this course
    • Course Introduction
  • Module 1 - Find
    • 1.1 How to Find Data for Storytelling and journalism
      • Starting with a question
      • Open data portals and platforms
      • Other sources of data
    • 1.2 How to get better data from a Goolge Search
      • Searching for filetypes and formats
      • More on Advanced Search operators
      • Other common Google Search operators
    • 1.3 Sourcing your own data
      • Creating a Google Form for Research
      • Creating a questionnaire with TypeForm
      • Using quizzes and comments as a sources of data
  • Module 2 - Get
    • 2.1 Turning websites and PDFs into machine readable data
      • Scraping data with Tabula
    • 2.2 An introduction to spreadsheet software
      • Google Sheets, Microsoft Excel and Libre Office Calc.
      • Finding your way around a spreadsheet
      • Simple web scraping with Google Sheets
  • Module 3 - Verify
    • 3.1 Can I use this data in my work?
      • Initial steps for verification
      • What do these column headings mean?
  • Module 4 - Clean
    • 4.1 What to do with disorganised data?
      • Why is clean data important?
      • Keep your data organised
      • Cleaning data cheatsheet
  • Module 5 - Analyse
    • 5.1 What is the story within the data?
      • Spreadsheet rows, columns, cells and tabs
        • Spreadsheet formats, forumlas and essential shortcuts
          • Using the VLOOKUP Function
            • Combine Data From Multiple Spreadsheets
    • 5.2 How to turn numbers into stories
  • Module 6 - Visualise
    • 6.1 Ways we visualise data
    • 6.2 Why we visualize Data
    • 6.3 How to visualise data
  • Course Testing & Feedback
    • ⏱️Quick course exam
    • 🎓Extended course exam
    • 📝Survey and feedback
Powered by GitBook
On this page
  • Correlation Vs Causation
  • The inverted pyramid of data journalism
  1. Module 5 - Analyse

5.2 How to turn numbers into stories

PreviousCombine Data From Multiple SpreadsheetsNext6.1 Ways we visualise data

Last updated 2 years ago

When mining data for stories, it is important to think about data as a journalist, not an analyst. The process of engaging with data from a journalistic perspective is known as interviewing the data.

“What has changed… is the explosion in data sources readily available on the web, which can both aid in telling important and necessary stories, but can also be easily misunderstood and potentially manipulated.

It’s more important than ever for citizens to develop new skills to use these data sources effectively. And just as that technology can aid us in finding the data, it can also allow us to share data with readers and build a community of curious people looking to explore the sources with us. There are so many stories buried in the details - more than we could ever hope to find on our OWN.”

Melissa Bell, vox.com

Correlation Vs Causation

Correlation does not equal causation!

e.g. a correlation between gender and salary that shows a correlation between women and lower salaries, does not mean that women earn less money because they are women.

What other factors could have a contributing effect? Experience Field (science, sports, IT) Academic qualifications Age, etc.

If you identify correlation between variables, consult a domain expert for an explanation.

The inverted pyramid of data journalism

Having a clear question at the start of the whole process, helps ensure you don’t lose your focus along the way.

The pyramid begins with a large amount of information which becomes increasingly focused as you drill down into it, until you reach the point of communicating the results.

Step 1 - Compile

Data-driven storytelling “begins in one of two ways: either you have a question that needs data, or a dataset that needs questioning. Whichever it is, the compilation of data is what defines it” -Paul Bradshaw

Compiling data might take the form of: Data supplied directly by an organisation Data found using advanced search techniques Data compiled by scraping hidden online databases Data compiled by converting documents into spreadsheet format Data obtained by pulling information from API’s Data collected through observation, surveys, crowdsourcing etc

  • The most important stage

  • Everything else rests on this

  • The stage returned to the most

  • At each subsequent stage, you may find that you need to compile more information

Step 2 - Clean

Being confident in the stories hidden within the data means being able to trust the quality of the data – and that means cleaning it.

There are 2 basic types of data cleaning: Removing human errors Converting data into the format that you require

Step 3 - Context

So like any source of a story, you need to ask questions of the source: who gathered it? when? for what purpose? how was it gathered?

Data can’t always be trusted!

If you began this process with a clear question, you will ensure you don’t lose focus at this point

Step 4 - Combine

You may often need to combine different sources of data using a common field like a unique identifier (ward key/code), a name (municipality), or classically using location.

Combining different datasets will enrich the data, and provide additional information to work with. It also provides additional sources of information.

Step 5 - Communicate

  • What is your narrative?

  • What visualisations will you use?

  • What case studies can you use?

  • Don’t forget the storytelling!

Give your story a human face, or a name: personal relevance