Initial steps for verification

There are a few standard things we examine before working on a dataset:

When we talk about data we think of it in relationship to a total population. Has everyone included in a population been considered or is it sampled data? Sampled data is when, for example, a thousand people are asked a question and the results used to infer something about the whole population.
Opinion polls, for example, are always sampled data.
Sampled data is often fine, so long as we know the methodology for the sample and that it is representative. For example, just because there are ten labradors at the park and no other dogs, we cannot assume every dog is a labrador.
When dealing with journalistic data, we have to ask why a certain selection was chosen for the sample. Does it bias the data (this is especially important when looking at press releases that claim large numbers of people like a product or act in a particular way)

Sometimes data can give itself away in the same way as fake news sites or phishing emails. Poorly written copy filled with spelling mistakes indicates little care was taken and a document could be inauthentic. Likewise data that is riddled with errors has likely been poorly captured and processed.

If you're in any doubt, consult an expert in the field who may be able to help ascertain if data is genuinely what it looks like.

Last updated 2 years ago