LogoLogo
The Fundamentals of Data-driven Storytelling
The Fundamentals of Data-driven Storytelling
  • About this course
    • Course Introduction
  • Module 1 - Find
    • 1.1 How to Find Data for Storytelling and journalism
      • Starting with a question
      • Open data portals and platforms
      • Other sources of data
    • 1.2 How to get better data from a Goolge Search
      • Searching for filetypes and formats
      • More on Advanced Search operators
      • Other common Google Search operators
    • 1.3 Sourcing your own data
      • Creating a Google Form for Research
      • Creating a questionnaire with TypeForm
      • Using quizzes and comments as a sources of data
  • Module 2 - Get
    • 2.1 Turning websites and PDFs into machine readable data
      • Scraping data with Tabula
    • 2.2 An introduction to spreadsheet software
      • Google Sheets, Microsoft Excel and Libre Office Calc.
      • Finding your way around a spreadsheet
      • Simple web scraping with Google Sheets
  • Module 3 - Verify
    • 3.1 Can I use this data in my work?
      • Initial steps for verification
      • What do these column headings mean?
  • Module 4 - Clean
    • 4.1 What to do with disorganised data?
      • Why is clean data important?
      • Keep your data organised
      • Cleaning data cheatsheet
  • Module 5 - Analyse
    • 5.1 What is the story within the data?
      • Spreadsheet rows, columns, cells and tabs
        • Spreadsheet formats, forumlas and essential shortcuts
          • Using the VLOOKUP Function
            • Combine Data From Multiple Spreadsheets
    • 5.2 How to turn numbers into stories
  • Module 6 - Visualise
    • 6.1 Ways we visualise data
    • 6.2 Why we visualize Data
    • 6.3 How to visualise data
  • Course Testing & Feedback
    • ⏱️Quick course exam
    • 🎓Extended course exam
    • 📝Survey and feedback
Powered by GitBook
On this page
  1. Module 3 - Verify
  2. 3.1 Can I use this data in my work?

Initial steps for verification

There are a few standard things we examine before working on a dataset:

Is the universe represented ie. is the dataset complete?

  • When we talk about data we think of it in relationship to a total population. Has everyone included in a population been considered or is it sampled data? Sampled data is when, for example, a thousand people are asked a question and the results used to infer something about the whole population.

  • Opinion polls, for example, are always sampled data.

  • Sampled data is often fine, so long as we know the methodology for the sample and that it is representative. For example, just because there are ten labradors at the park and no other dogs, we cannot assume every dog is a labrador.

  • When dealing with journalistic data, we have to ask why a certain selection was chosen for the sample. Does it bias the data (this is especially important when looking at press releases that claim large numbers of people like a product or act in a particular way)

Check the quality of the data – look for errors or inconsistencies

  • Sometimes data can give itself away in the same way as fake news sites or phishing emails. Poorly written copy filled with spelling mistakes indicates little care was taken and a document could be inauthentic. Likewise data that is riddled with errors has likely been poorly captured and processed.

If you're in any doubt, consult an expert in the field who may be able to help ascertain if data is genuinely what it looks like.

Previous3.1 Can I use this data in my work?NextWhat do these column headings mean?

Last updated 2 years ago