2.1 Turning websites and PDFs into machine readable data

The second part of the data storytelling pipeline is "Get". By getting data, we mean taking the data you have found and turning it into a machine readable format that you can work with. If you found your dataset on an open data portal, this is easy - you should be able to download a CSV or XLS file from the source. In many cases, however, there are a few steps to go through in order to extract data from a website or document such as a PDF.

In this lesson we will explore how to turn websites into machine readable data. Machine readable data refers to data that is data in a format that can be processed by a computer. Machine-readable data must be structured data.

Data that is stored as letters and numbers in a digital format is known as machine readable data. Our goal for working with procurement data is to get information supplied into this format, preferably in a spreadsheet, so that we can perform analysis.

Last updated