LogoLogo
The Fundamentals of Data-driven Storytelling
The Fundamentals of Data-driven Storytelling
  • About this course
    • Course Introduction
  • Module 1 - Find
    • 1.1 How to Find Data for Storytelling and journalism
      • Starting with a question
      • Open data portals and platforms
      • Other sources of data
    • 1.2 How to get better data from a Goolge Search
      • Searching for filetypes and formats
      • More on Advanced Search operators
      • Other common Google Search operators
    • 1.3 Sourcing your own data
      • Creating a Google Form for Research
      • Creating a questionnaire with TypeForm
      • Using quizzes and comments as a sources of data
  • Module 2 - Get
    • 2.1 Turning websites and PDFs into machine readable data
      • Scraping data with Tabula
    • 2.2 An introduction to spreadsheet software
      • Google Sheets, Microsoft Excel and Libre Office Calc.
      • Finding your way around a spreadsheet
      • Simple web scraping with Google Sheets
  • Module 3 - Verify
    • 3.1 Can I use this data in my work?
      • Initial steps for verification
      • What do these column headings mean?
  • Module 4 - Clean
    • 4.1 What to do with disorganised data?
      • Why is clean data important?
      • Keep your data organised
      • Cleaning data cheatsheet
  • Module 5 - Analyse
    • 5.1 What is the story within the data?
      • Spreadsheet rows, columns, cells and tabs
        • Spreadsheet formats, forumlas and essential shortcuts
          • Using the VLOOKUP Function
            • Combine Data From Multiple Spreadsheets
    • 5.2 How to turn numbers into stories
  • Module 6 - Visualise
    • 6.1 Ways we visualise data
    • 6.2 Why we visualize Data
    • 6.3 How to visualise data
  • Course Testing & Feedback
    • ⏱️Quick course exam
    • 🎓Extended course exam
    • 📝Survey and feedback
Powered by GitBook
On this page
  1. Module 4 - Clean
  2. 4.1 What to do with disorganised data?

Cleaning data cheatsheet

PreviousKeep your data organisedNext5.1 What is the story within the data?

Last updated 2 years ago

Welcome to our lesson on Data Cleaning! By the end of this lesson, you will have a solid understanding of common data cleaning techniques and best practices, and be able to apply these skills to real-world datasets.

Whether you are a journalist, data analyst, researcher, or anyone working with data, this short lesson will provide you with the tools and knowledge you need to clean and prepare your data for analysis. Let's get started.

Click the expanders bellow to view the following topics about data cleaning.

The given example demonstrates the use of Google Sheets, but the same cleaning can also be accomplished using Microsoft Excel. For further information on how to cleanse data using Microsoft Excel, please refer to this link.

Open Google Sheets and import your data:

First, open Google Sheets and import the data you want to clean. You can do this by clicking on File > Import, and selecting your data file.

Remove duplicates:

To remove duplicates from your data, select the column(s) that you want to check for duplicates, then click on Data > Remove duplicates. This will remove any duplicate rows based on the selected columns.

Remove blank rows or columns:

To remove blank rows or columns, select the row(s) or column(s) that you want to remove, then right-click and select "Delete row" or "Delete column".

Split data into columns:

If your data is in a single column but should be split into multiple columns, you can use the "Split text to columns" feature. Select the column that you want to split, then click on Data > Split text to columns. Choose the delimiter that separates your data, such as a comma or space, and Google Sheets will split the data into separate columns.

Merge data from multiple columns:

To merge data from multiple columns into a single column, you can use the "Concatenate" function. Type "=CONCATENATE(A1,B1,C1)" into a new column, where A1, B1, and C1 are the columns you want to merge.

Format data:

Finally, you may need to format your data to make it more readable. You can do this by selecting the column(s) that you want to format, then clicking on Format > Number. Here you can choose the desired format, such as currency or percentage.

Check for errors:

Make sure to scan your data for common errors such as spelling mistakes, incorrect dates, or missing values. You can use the "Find and Replace" feature to search for and correct errors.

Remove unnecessary characters:

If your data contains unnecessary characters, such as leading or trailing spaces, you can use the "Trim" function to remove them. Simply select the column you want to clean, then type "=TRIM(A1)" into a new column, where A1 is the cell you want to trim.

Use conditional formatting:

Conditional formatting can help you quickly identify outliers or other patterns in your data. For example, you can highlight cells that are above or below a certain value, or cells that contain specific text.

Use data validation:

Data validation can help you ensure that your data is accurate and consistent. For example, you can use data validation to create a drop-down list of acceptable values for a specific column, or to set a maximum or minimum value for a certain cell.

See tips here.