<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=95044&amp;fmt=gif">
Back to blog

Correlation vs. Causation

Tuesday, September 08, 2015

Identifying the relationship between data points is relatively simple. But, just because a relationship exists between the data points, that does not mean that one data point or event caused the other. In order to derive value from your data, it is useful to understand how one factor affects or doesn’t affect another, to provide some meaningful context for the complete data analysis.

Without getting too technical, let’s examine what the two terms—correlation and causation actually mean, and why do they matter in the larger scheme.

First, correlation indicates a relationship between two variables, usually found in statistics. For example, there may be a strong, positive relationship between e-mail sends, and site visits. Correlation gives you a strong starting point for further testing and optimization. If you know that there is a relationship between site search and revenue, you can then look at why. In terms of correlation, there are some things that you have to keep in mind.

Take the above email sends and site visits example. You shouldn’t make the assumption that just because someone received the email and went to the website, that the email caused that person to visit the site. It is certainly possible that the viewer had intended to go to the site anyway, and the email actually had nothing to do with it at all. Another case could be that a person was searching, for say, the term “black dresses,” and saw a paid search ad, which caused her to click on it and purchase something. You really need to be cautious when looking at correlations, especially in the online world. This is not to undermine the role of digital media, but to appreciate its role as often being a great facilitator or complement to other activity, or what I have often referred to as the greatest marketing ‘condiment’ that has ever existed.

Next, causation refers to when an observed event or action appears to have caused a second event or action.  Going back to the email example, this could mean my act of opening and clicking on an email link caused me to go to a website.

Understanding what types of variables you are analyzing gives you a much better picture of the likelihood of one element causing a particular outcome to occur. This includes leading indicators, (which tell us if we see X today, then we can expect Y to happen), concurrent indicators (as X happens, Y is probably happening at the same time), and trailing indicators (usually when there is a long delay in reporting, these tell us if we see that X happened, then Y probably already happened as well.)

If you have the time and the resources to actually prove causality, then definitely go for it.  

The only way to understand correlation and causation and their overall impact is through A/B testing, customer surveys, and additional analysis. When it comes to testing, do your best to keep as many attributes as static as possible, so you can isolate that one variable that you are truly trying to test. So, what is the point of doing all of this analysis? You need to be able to determine what the data actually shows, rather than finding the data that just proves your ideas. With clear, consistent, focused research and analysis, you can easily accomplish this, and gain some valuable insights. Above all, test everything and don’t assume anything. 

Meghan Rogers

Written by Meghan Rogers

Recent Posts

Comment Below

Subscribe to Email Updates