LogoLogo
The Fundamentals of Data-driven Storytelling
The Fundamentals of Data-driven Storytelling
  • About this course
    • Course Introduction
  • Module 1 - Find
    • 1.1 How to Find Data for Storytelling and journalism
      • Starting with a question
      • Open data portals and platforms
      • Other sources of data
    • 1.2 How to get better data from a Goolge Search
      • Searching for filetypes and formats
      • More on Advanced Search operators
      • Other common Google Search operators
    • 1.3 Sourcing your own data
      • Creating a Google Form for Research
      • Creating a questionnaire with TypeForm
      • Using quizzes and comments as a sources of data
  • Module 2 - Get
    • 2.1 Turning websites and PDFs into machine readable data
      • Scraping data with Tabula
    • 2.2 An introduction to spreadsheet software
      • Google Sheets, Microsoft Excel and Libre Office Calc.
      • Finding your way around a spreadsheet
      • Simple web scraping with Google Sheets
  • Module 3 - Verify
    • 3.1 Can I use this data in my work?
      • Initial steps for verification
      • What do these column headings mean?
  • Module 4 - Clean
    • 4.1 What to do with disorganised data?
      • Why is clean data important?
      • Keep your data organised
      • Cleaning data cheatsheet
  • Module 5 - Analyse
    • 5.1 What is the story within the data?
      • Spreadsheet rows, columns, cells and tabs
        • Spreadsheet formats, forumlas and essential shortcuts
          • Using the VLOOKUP Function
            • Combine Data From Multiple Spreadsheets
    • 5.2 How to turn numbers into stories
  • Module 6 - Visualise
    • 6.1 Ways we visualise data
    • 6.2 Why we visualize Data
    • 6.3 How to visualise data
  • Course Testing & Feedback
    • ⏱️Quick course exam
    • 🎓Extended course exam
    • 📝Survey and feedback
Powered by GitBook
On this page
  1. Module 3 - Verify
  2. 3.1 Can I use this data in my work?

What do these column headings mean?

PreviousInitial steps for verificationNext4.1 What to do with disorganised data?

Last updated 2 years ago

Columns describe the rows i.e. the rows represent the number of entries, and the values in the columns describe the attributes of these entries

When we look at a table of data, the first row (or rows) should indicate what the columns of data contain. For example, in the example below, it's fairly easy to guess that Gender, Population Group, Current Institution, Faculty, are all attributes of people.

Example

But what about in this example, taken from an annual report published by the Department of Transport?

You can probably figure out what the column Geo_type is by reading down a few entries, and it seems obvious that Pr_code relates to province. The rest of the table, however, is meaningless without something to interpret the variable names in the columns.

Good data should come with "metadata" attached, either within the spreadsheet or or as a separate file. Metadata is data about the data, and can include an explaination of column headings, publication times, publisher names and methodoloogy used for collection. It should also tell you the licensing conditions for the dataset and whether or not you can use it.

Here's an example of metadata taken from a World Bank dataset.

Not all data comes with metadata attached. Be very wary about data you can't be 100% sure of though.

You can see this for yourself . Open it up and click on the button marked "Details". This will show you the first part of the metadata, along with more links to get more information.

at this link