The Story of Controlled Experimentation
If science had to be explained in the fewest words possible, I would choose to call it "the discovery of causality". The finest of our instincts and intelligence often guide us towards asking why things happen and what would happen if those things had not happened. It seems as if underlying the laws of nature there is a causal graph that links causes to their effects. Finding and understanding these causal links to design systems that harness the power of causality is the primary pursuit of science.
Observations and data are the inputs to our study of causality but data has a sharp way of deceiving us about causality. You usually start to look at things that happen together in the dataset to get a peek into what is causing what but what happens when you see prices of tomatoes being correlated to prices of cars (both are caused by inflation), and children's shoe sizes being correlated to the size of their bags (both are caused by age) and finally, the correlation between the number of birds and the number of babies in the city (both caused by settlements). You get deceived by data because often two seemingly unrelated things come out to be correlations in data. See this link for interesting spurious correlations.
You realise that observational data can never infer causality for you accurately. What you need to infer causality is an experiment. An experiment that lets you control everything else and just change one cause and see its effect.
Experimentation is the scientific approach to study causality because it lets you cancel out the effect of all other factors and isolate a causal link to observe its impact. However, as you will see an experimentation itself has some very interesting ways to deceive you and designing experiments that can be trusted is the job of a scientist.
Randomised Control Trials (RCT)
Randomised Control Trials (RCTs) is the statistical name of one of the most trustworthy approaches in experimentation that theoretically guarantees the cancellation of all other factors in an experiment. These "other factors" are also called confounders. The statistical definition of a confounder is a variable that affects both the cause and the effect of the study hence distorting the effect of the cause on the effect. The inflation, age and settlement variables in the three examples of spurious correlations (in 2nd paragraph) are actually confounders. It is an interesting story of how RCTs were invented and helps you grasp the intuition behind cancelling of confounders.
RA Fisher was a scientist who wanted to study the impact of various fertilisers on the yield of his crops. Fisher started by neatly splitting his field into two parts and applying the fertiliser to only one. Fisher tried to study the impact of the fertilisers by comparing the yield on the two halves but soon realised that the field was receiving unequal sunlights in different halves along with other variations such as the soil quality, water distribution and so on. These were the additional causal links that were distorting the impact of fertilisers. So, RA Fisher divided his field into smaller squares and randomly chose the squares to put the fertiliser on. This randomisation essentially helped Fisher cancel out the impact of all other facts as now those factors were being randomly distributed into both groups (fertiliser and no fertiliser).
Randomisation became the key weapon in Fisher's attempt to cancel out the effect of all other factors and isolating the causal link between giving the fertiliser to crops or not. This randomisation gave us the gold standard of isolating causal impact in variables and till date is the most trusted form of an experiment, wherever you can enforce such randomisation.
Application of RCTs to experiments
Randomised Control Trials have been adopted by various domains of science under various names. T-tests, hypothesis testing, clinical trials and A/B Testing are all the different names of randomised controlled trials. Any study that controls the effect of confounders by randomising the treatment given to small sub-units in the expeirments is in effect a randomised control trial.
Medical Sciences successfully started using statistical tests to test the effectiveness of drugs by giving them to a randomised set of patients. An even more rigorous application was in the field of psychology and psychiatry where scientists developed blind and double blind approaches of randomisation to control the human biases introduced in the process of testing. Social Scientists also found ethical ways to implement RCTs and published numberous studies defining the impact of various policies on social factors. However, the truth was that not all causal links could be tested "ethically" by randomised control trials. One such example is that of testing the impact of smoking on death.
Recently, RCTs have started becoming popular in the domain of web-based products where it has been popularised by the name of A/B Testing. A/B Testing has become popular in all internet companies as the easy way to developing your product to drive up metrics of interest. For instance, all big companies like Microsoft, Amazon and Facebook constantly run hundreds of experiments on their users to find out what makes customers engage more with their products, what makes customers buy more things and what makes customers click on advertisements. A/B Testing has taken the world by storm as multiple online vendors like VWO (the company I work for), Optimizely and Google Optimize has made it a 15 minute job to setup your first test.
With the emergence of data, experimentation has been democratised to an extent that it is no longer just a tool for scientists but also for working professionals. Going forward, it matters a lot on how quickly and trustworthily can you test all your beliefs and how quickly you can dispose of those that you cannot verify with experimentation.
Going forward, I have decided that experimentation will be a major theme of my blog. I invite questions and criticisms to all my ideas.