Welcome to our lesson on Data Cleaning! By the end of this lesson, you will have a solid understanding of common data cleaning techniques and best practices, and be able to apply these skills to real-world datasets.
Whether you are a journalist, data analyst, researcher, or anyone working with data, this short lesson will provide you with the tools and knowledge you need to clean and prepare your data for analysis. Let's get started.
Click the expanders bellow to view the following topics about data cleaning.
The given example demonstrates the use of Google Sheets, but the same cleaning can also be accomplished using Microsoft Excel. For further information on how to cleanse data using Microsoft Excel, please refer to this link. See tips here.
Open Google Sheets and import your data:
First, open Google Sheets and import the data you want to clean. You can do this by clicking on File > Import, and selecting your data file.
Remove duplicates:
To remove duplicates from your data, select the column(s) that you want to check for duplicates, then click on Data > Remove duplicates. This will remove any duplicate rows based on the selected columns.
Remove blank rows or columns:
To remove blank rows or columns, select the row(s) or column(s) that you want to remove, then right-click and select "Delete row" or "Delete column".
Split data into columns:
If your data is in a single column but should be split into multiple columns, you can use the "Split text to columns" feature. Select the column that you want to split, then click on Data > Split text to columns. Choose the delimiter that separates your data, such as a comma or space, and Google Sheets will split the data into separate columns.
Merge data from multiple columns:
To merge data from multiple columns into a single column, you can use the "Concatenate" function. Type "=CONCATENATE(A1,B1,C1)" into a new column, where A1, B1, and C1 are the columns you want to merge.
Format data:
Finally, you may need to format your data to make it more readable. You can do this by selecting the column(s) that you want to format, then clicking on Format > Number. Here you can choose the desired format, such as currency or percentage.
Check for errors:
Make sure to scan your data for common errors such as spelling mistakes, incorrect dates, or missing values. You can use the "Find and Replace" feature to search for and correct errors.
Remove unnecessary characters:
If your data contains unnecessary characters, such as leading or trailing spaces, you can use the "Trim" function to remove them. Simply select the column you want to clean, then type "=TRIM(A1)" into a new column, where A1 is the cell you want to trim.
Use conditional formatting:
Conditional formatting can help you quickly identify outliers or other patterns in your data. For example, you can highlight cells that are above or below a certain value, or cells that contain specific text.
Use data validation:
Data validation can help you ensure that your data is accurate and consistent. For example, you can use data validation to create a drop-down list of acceptable values for a specific column, or to set a maximum or minimum value for a certain cell.