Cleaning data cheatsheet

Welcome to our lesson on Data Cleaning! By the end of this lesson, you will have a solid understanding of common data cleaning techniques and best practices, and be able to apply these skills to real-world datasets.

Whether you are a journalist, data analyst, researcher, or anyone working with data, this short lesson will provide you with the tools and knowledge you need to clean and prepare your data for analysis. Let's get started.

Click the expanders bellow to view the following topics about data cleaning.

circle-info

The given example demonstrates the use of Google Sheets, but the same cleaning can also be accomplished using Microsoft Excel. For further information on how to cleanse data using Microsoft Excel, please refer to this link. See tips here. arrow-up-right

chevron-rightOpen Google Sheets and import your data:hashtag

First, open Google Sheets and import the data you want to clean. You can do this by clicking on File > Import, and selecting your data file.

chevron-rightRemove duplicates: hashtag

To remove duplicates from your data, select the column(s) that you want to check for duplicates, then click on Data > Remove duplicates. This will remove any duplicate rows based on the selected columns.

chevron-rightRemove blank rows or columns:hashtag

To remove blank rows or columns, select the row(s) or column(s) that you want to remove, then right-click and select "Delete row" or "Delete column".

chevron-rightSplit data into columns: hashtag

If your data is in a single column but should be split into multiple columns, you can use the "Split text to columns" feature. Select the column that you want to split, then click on Data > Split text to columns. Choose the delimiter that separates your data, such as a comma or space, and Google Sheets will split the data into separate columns.

chevron-rightMerge data from multiple columns: hashtag

To merge data from multiple columns into a single column, you can use the "Concatenate" function. Type "=CONCATENATE(A1,B1,C1)" into a new column, where A1, B1, and C1 are the columns you want to merge.

chevron-rightFormat data:hashtag

Finally, you may need to format your data to make it more readable. You can do this by selecting the column(s) that you want to format, then clicking on Format > Number. Here you can choose the desired format, such as currency or percentage.

chevron-rightCheck for errors: hashtag

Make sure to scan your data for common errors such as spelling mistakes, incorrect dates, or missing values. You can use the "Find and Replace" feature to search for and correct errors.

chevron-rightRemove unnecessary characters: hashtag

If your data contains unnecessary characters, such as leading or trailing spaces, you can use the "Trim" function to remove them. Simply select the column you want to clean, then type "=TRIM(A1)" into a new column, where A1 is the cell you want to trim.

chevron-rightUse conditional formatting: hashtag

Conditional formatting can help you quickly identify outliers or other patterns in your data. For example, you can highlight cells that are above or below a certain value, or cells that contain specific text.

chevron-rightUse data validation:hashtag

Data validation can help you ensure that your data is accurate and consistent. For example, you can use data validation to create a drop-down list of acceptable values for a specific column, or to set a maximum or minimum value for a certain cell.

Last updated