Information is beautiful. Usually.

I love analysing data.

I think it would be extremely difficult to pursue a career in digital marketing if you didn’t enjoy looking at numbers and trying to work out the underlying story. Data is the new oil and I am happy to confess to an obsession with looking at data to build an opinion.

The old adage about a picture being worth a thousand words is usually true and I think one of the challenges that we face, as digital marketers, is how we can present complex data in a way that immediately makes sense. In many cases, a visual representation of numbers is the best way to achieve this. Who doesn’t love a decent infographic? They are surely better than pivot tables?

I therefore spend quite a bit of time thinking about how to visualise data and love tools such as Google Data Studio, which make it much easier to tell a story. It is also why there are books such as these in the Browser Media library:

Manipulation of data

It is perhaps this personal passion that led to so much frustration over the past year. Raw data has simply not backed up the hysterical headlines that have been ever-present in the media. I had a bit of a rant at the mainstream media previously, but thought that it was about time I turned my attention to the Gov coronavirus briefings, as they have been a constant source of rage in my brain.

Why rage?

Simply because the quality of the data that has been presented has been lamentable. From staggeringly unprofessional graphs (a billion to a slide, all with varying scales, etc.) to blatant lies, I have often found myself shouting at the screen and steam has erupted from my ears as the data geek within me reacts to some pitiful data visualisations.

Mount Friedlein was erupting again yesterday as I watched the latest briefing. The slide that saw lava spewing from my brain was this one:

vaccine efficacy

On the surface of things, I should be applauding this. It is far removed from the visual diarrhoea that we have seen before, where the slides were crammed full of tiny graphs that literally didn’t fit on the screen. It is visually uncluttered and most people understand the concept of percentage, so having 100 dots is a reasonable graphical representation of an analysis based on percentages.

Although obviously not actually accurate, I can forgive the rounding of data to ensure that everything comes in 5% chunks as this keeps the graphs looking pretty. I do not wish to nit pick.

My misgivings start with a general questioning of why the data ranges are so large. A 55-70% reduction in symptomatic disease for those that are vaccinated feels like a huge range? Why can they not present actual data from trials that I hope they are managing carefully, given the seismic effect on society of all the measures that the Gov has imposed? I am not sure why this should be a range at all – surely this should be an absolute number? Perhaps not for symptomatic disease, but do they not know how many people actually go to hospital or die? If the sample size is too small (which is a credible reason to have a range), would it not be more prudent to wait until there was more faith in the data?

More importantly, why is it an estimate at all? Either you have accurate data or you don’t. This is very serious stuff and I do not believe that best guesses should be presented to us when the ramifications for everything that is going on are so serious. If you want to use data to justify policy, is it not reasonable to expect the data to be incontestable numbers rather than brushstroke estimates? There are far too many ‘may be’, ‘could’, ‘possibly’, ‘suggests’, ‘estimates’, etc caveats to a lot of what we are being told. This undermines the confidence we have in the data.

My biggest gripe, however, is the implied message. As I mentioned earlier, visual representation of data can help tell a story. The story that we are being given here is that 100% of the unvaccinated will end up ill, be hospitalised and end up dead. This is very plainly untrue. Even the Gov admits this in the £million advertising campaigns that have told us that how so many people do not even know that they have Covid.

I can hear you shouting that this is not the case and the visualisation is showing that for every 100 unvaccinated illnesses, there will be 30-45 illness amongst the vaccinated. That is indeed true, but you will have thought about the charts more than most and the use of 100 blue dots in a percentage table is deliberately alarming.

I appreciate that it can be very difficult to really measure the efficacy of any drug without challenger trials (where you deliberately expose people to the virus), but it is very important to understand the difference between relative risk reduction (RRR) and absolute risk reduction (ARR). All the vaccine efficacy data is based on relative risk reduction, which makes the numbers look much better than they possibly are. I will defer to an excellent summary from The Lancet rather than try to explain this difference myself as it does get a little confusing and I hope that the Lancet can be trusted.

I absolutely appreciate that the Gov has a message that it wants to convey (get a vaccine) but I feel that this particular graph is another example of deliberate scaremongering and is not entirely honest, even if it is statistically accurate. At the very least, would it not be right to show the reduction in numbers from symptomatic disease > hospitalisation > death for both segments of the population? The only plausible reason not to do that is to continue to scare people.

Personally, I prefer the following representation of the current level of risk of symptomatic disease for the unvaccinated and vaccinated populations of England*:

Realistic representation of data

*The official prevalence of Covid 19 amongst the UK population 1 in 1,180 according to the latest ONS data. This equates to 0.08% of the population. A lot of reports suggest that 80% of these cases are asymptomatic, but let’s use 50% to err on the side of caution. That would mean that 0.04% of the unvaccinated population would develop symptomatic disease. If we use the highest level of efficacy of 70%, this means that just 0.012% of the unvaccinated population would develop symptomatic disease.

If we are looking at data from a percentage perspective, should we not look at these percentages applied to the population as a whole? We need to look at the differences between different segments of the population, but applying percentages to very small numbers can be misleading. The thin slices in my mock up above are already oversized, but I had to bump them up a bit so that you can actually see them. I hope it demonstrates the point though – when you look at the big picture, we have every reason to be very positive. Thankfully, a tiny percentage of our population is going to be ill, let alone hospitalised.

The moral of my ranty story is that you should always challenge data that is presented to you. Former Prime Minister Benjamin Disraeli is credited with the saying, “There are three kinds of lies: lies, damned lies, and statistics.” We tend to trust graphs and data more than a written opinion, but there is no doubt that skillful manipulation of data can help underpin almost any message and can absolutely cloud the truth. Data is often presented as ‘fact’ but you should always ask yourself who is presenting the data and what the numbers really mean.

This is, of course, not just relevant to government media briefings. Digital marketing is very much data driven and very much subject to manipulation. I have seen far too many boastful digital marketing reports that eulogise about triple digit percentage growth in web traffic (to use one example). Whilst that sounds great and we all tend to gravitate towards green arrows pointing up, but going from 4 visits per month to 8 is hardly going to set the world on fire. It is also very important to look past the numbers – would you rather have 500 extra visits per month from highly engaged visitors or 4,000 extra visits that leave your site immediately? Vanity metrics are useless metrics and must be challenged.