3 min read

Why statistics became an integral part of experimentation?

Why statistics became an integral part of experimentation?
A figure drawn by myself to explain the core concept of this article

While the last two posts on experimentation and A/B testing, clearly define the purpose and the need for experimentation, the question persists on how did statistics get involved in the scientific methodology and became such an integral part of experimentation. The answer is because the world is random and all data that you collect also has some element of randomness to it. Statistics is the science of dissecting this randomness and making your way through it. To explain to you why it is so integral to experimentation, let me start with a thought experiment.

Suppose there are two ways to go from point A to point B and I ask you a simple question which of the two routes is shorter? You will probably open up Google Maps and within seconds tell me the exact distances on both routes. There is no statistics required for you to answer my question because Google Maps tell you an exact deterministic answer to my question.

Let us now change the question slightly. What if I ask you which of the two routes is faster? While most will be tempted to again open Google Maps and tell me the ETA via both routes, note that these estimated numbers are not as deterministic as distance. They are often just predicted estimates of time because the time taken to cover a route is often highly random (due to traffic). But you still have to answer my question and let us say the time shown on both routes is close, say 24 minutes (on Route 1) and 25 minutes (on Route 2). How do you know in this scenario if Route 1 is coming out to be faster just by chance or if it is usually faster than Route 2? This is where statistics comes in and this is where you need to answer the question if this observed difference is significant or not.

Randomness and how it affects our judgement

Rarely in scientific inquiry and in data collection do you see neat and deterministic data. Most data is random and is distributed around a central quantity in form of a bell curve. Whenever we compare two groups of data points collected from an experiment, often the question we are asking is that with randomness kept aside, does this data seem to suggest that there is an underlying difference in these two groups or not.

There are two crucial components to this query. Say if the time difference you see between the two routes is huge, for instance, the first one is 25 minutes and the other one is 45 minutes, you won't doubt the fact that Route 1 definitely seems faster. Even if 25 minutes has some randomness around it, you won't expect Route 2 to ever be faster than Route 1. Think about this why is it that a 1-minute difference makes you doubt the efficacy of your observations whereas a 20-minute difference does not? This is because you feel that a 4% deviation can occur due to traffic however, an almost 80% deviation in time is unlikely.

You need a ratio of two quantities to decipher statistical significance:

  1. Effect Size: The difference between the means of the two sample sets is essentially the effect size of the experiment.
  2. Standard Deviation: The amount by which the samples of the two groups vary around the means is what is known as the standard deviation of the samples.
  3. Effect Size/Standard Deviation: For developing an intuition, it is this ratio that matters in deciding statistical significance. If the effect-size is much larger than the standard deviation of the samples you can be sure that the difference is significant. If not, then you cannot call the difference to be significant.

The following picture will give you better intuition about the c0ncept. Observe how the effect size is almost 0.1 (Mean of A: 0.25, Mean of B: 0.35). But at the same time, standard deviation is also quite significant (almost 0.05).

Hypothesis Testing

The formalization of the intuition that I gave you led us to a standard hypothesis testing mechanism in statistics called a Student's t-test. The t-test starts with a null hypothesis that there is no difference in the two groups being observed and then tries to calculate the probability of the observed data keeping in mind this intuition. If you have ever heard anything around p-values then it must have come from a hypothesis test.

Hypothesis Testing has been the pinnacle solution to the problem described above for almost all of the 20th century. In the next post, I will describe the anatomy of a hypothesis test and how it formalizes the intuition above into a coherent statistical process.