LogoLogo
The Fundamentals of Data-driven Storytelling
The Fundamentals of Data-driven Storytelling
  • About this course
    • Course Introduction
  • Module 1 - Find
    • 1.1 How to Find Data for Storytelling and journalism
      • Starting with a question
      • Open data portals and platforms
      • Other sources of data
    • 1.2 How to get better data from a Goolge Search
      • Searching for filetypes and formats
      • More on Advanced Search operators
      • Other common Google Search operators
    • 1.3 Sourcing your own data
      • Creating a Google Form for Research
      • Creating a questionnaire with TypeForm
      • Using quizzes and comments as a sources of data
  • Module 2 - Get
    • 2.1 Turning websites and PDFs into machine readable data
      • Scraping data with Tabula
    • 2.2 An introduction to spreadsheet software
      • Google Sheets, Microsoft Excel and Libre Office Calc.
      • Finding your way around a spreadsheet
      • Simple web scraping with Google Sheets
  • Module 3 - Verify
    • 3.1 Can I use this data in my work?
      • Initial steps for verification
      • What do these column headings mean?
  • Module 4 - Clean
    • 4.1 What to do with disorganised data?
      • Why is clean data important?
      • Keep your data organised
      • Cleaning data cheatsheet
  • Module 5 - Analyse
    • 5.1 What is the story within the data?
      • Spreadsheet rows, columns, cells and tabs
        • Spreadsheet formats, forumlas and essential shortcuts
          • Using the VLOOKUP Function
            • Combine Data From Multiple Spreadsheets
    • 5.2 How to turn numbers into stories
  • Module 6 - Visualise
    • 6.1 Ways we visualise data
    • 6.2 Why we visualize Data
    • 6.3 How to visualise data
  • Course Testing & Feedback
    • ⏱️Quick course exam
    • 🎓Extended course exam
    • 📝Survey and feedback
Powered by GitBook
On this page
  1. Module 4 - Clean

4.1 What to do with disorganised data?

"Data is often disorganised, making analysis and manipulation into visualisations difficult."

PreviousWhat do these column headings mean?NextWhy is clean data important?

Last updated 2 years ago

In this lesson we will focus on one of the most important parts of data-driven storytelling, but also the place where most mistakes are made.

  • We will learn how to handle data and here we encounter the problem of the human interacting with the original information.

  • Sometimes cleaning alone can take 40% or 60% of the time required by the entire data pipeline process.

  • We will focus on some cleaning techniques and develop and understanding of the logic behind them.

  • Identify which variables in the data we are interested in because getting into the titanic task of cleaning it all might not be necessary. And time is an asset we cannot waste.

Organising data