Reference: This note details the third topic in the 7 broader topics of discussion in experimentation. Further, the points listed in this blog will be later converted into their own blog posts.
The early 2000s saw the rise of many internet companies but few realized that customer behavior is hard to predict and experimentation is the road ahead. Some of the prominent leaders of experimentation in the industry back then were Amazon, Microsoft, ETSY, Facebook, Google, and Bookings.com (reference: Stefan Thomke). Over the past 20 years, these giants have carefully listened to customer preferences, testing each feature and each idea before adding it to the final product. Slowly and steadily, these companies have developed a culture of experimentation and built advanced experimentation platforms that help them run thousands of experiments in a year. In this note, we will explore the various different components of a mature experimentation platform.
Experimentation has pleasantly flourished with the internet industry. The kind of features that a modern experimentation platform offers lets you do much more with collected data than was ever done in other applications of experimentation. Most of these features are customised to the company's experimentation needs and most third-party vendors of experimentation still provide a limited set of features that are widely applicable. The components below should help you get an idea about the true power of a sophisticated experimentation platform.
The various interesting components
In this section, I list down a few of the major interesting components that a sophisticated experimentation engine can have. I would like to note that this list is based on my current understanding and as it evolves I will add more components to it.
- Feature Flags: Lukas Vermeer says that feature flags are the gateway drug to experimentation. Feature flags are on-off buttons for the features of your product that you can smoothly toggle for a proportion of your users. Without feature flags, implementing switches for different features would require you to maintain various branches of your code. Feature flags remove the dependency on the software engineer to maintain and switch over these various branches. Once you start to use feature flags, experimentation inevitably becomes the next step. See the capabilities of a dedicated feature flagging tool here: LaunchDarkly
- Website Editors: For a digital marketer, usually the biggest limitation in online experimentation is to get a software engineer to implement the changes that they want to test. Third-party experimentation tools like VWO and AB Tasty, hence provide a WYSIWYG (what you see is what you get) that let you play around with various elements of your webpage to seamlessly implement your idea. Website Editors also let you add interesting elements like forms and surveys on your website to run an experiment. Most experimenters hence prefer a tool with a good website editor that can help them start with experimentation right away.
- Metric Managers: Effective experimentation is a lot about the metrics that you want to test. Advanced experimentation companies usually define success metrics (those that you want to improve) and guardrail metrics (those that should not be reduced in the process) for every team. This helps various teams to evolve the product in harmony without hurting each other's work. Large-scale experimentation engines require an interface to manage and track these metrics for various teams. Developing and creating effective metrics to track dynamic systems is a broader field in itself. Ronny Kohavi recommends the book, How To Measure Anything by Douglas Hubbard to understand this area deeply.
- Framework for Concurrent Experiments: Most experimenters do not want to wait for one experiment to end before running another one. Hence, modern experimentation engines usually have the capability to run multiple experiments at once. At any time, a single user can be a part of multiple experiments on websites such as Amazon and Facebook. Running concurrent experiments, require you to check various things to ensure that all experiments are trustworthy. For instance, one such thing is interaction effects. Imagine if someone runs an experiment, to test the change of text color from black (C1) to red (V1). Also, suppose that someone else runs an experiment testing out the change in background color from white (C2) to red (V2). 25% of the visitors who will be sent to the V1V2 bucket would not be able to see the red text on a red background and hence will not convert. These are what are interaction effects.
- Hazard Management: Advanced Experimentation engines usually have the autonomy to kill an experiment in case a hazard happens. A hazard in this context could be that either a new experiment is showing interaction effects with other experiments, or that a guardrail metric is hurt. In such cases, the experimentation engine would automatically kill the experiment (stop diverting visitors to that variation) and notify the owner of the experiment. Hazard Managers are extremely effective if you want to build a free culture of experimentation in your company where any employee can test out their idea without management approval.
- Social Networking Features: Large companies trying to build a culture of experimentation often want to have extensive discussions over the results of an experiment. This allows for new experimenters to learn quickly and interesting findings to be shared across the company for everyone to see. Hence, these companies often have social networking features such as like, comment, and share baked into their experimentation engines. A social network for experimentation helps accelerate the adoption and reinforcement of the culture of experimentation.
- Institutional Memory: When tens of thousands of experiments are being run in a single year, it becomes extremely important that all past learnings are available to new experimenters and that old experiments are searchable. Hence, experimenters often tag their experiments with hashtags for future references and it is encouraged that whenever one tries out an idea, past results of similar ideas are taken into account. The institutional memory of experimentation hence protects a company from going into a loop of testing similar ideas again and again.
- Deeper Data Analysis: The data that is collected for experimentation is juicy and a lot of interesting insights can be derived from them. Most experimentation engines implement some advanced algorithms to make the best use of this data. A company called StatSig implements the Ultrasound feature which lets you see a summary of which goals are being impacted the most by which experiments. Others like VWO lets you slice and dice your data to see conversion rates across various segments of your audience. All such algorithms provide helpful insights for future experimentation.
The way ahead
The market for experimentation is ripe and a lot of experimentation engines are trying to build an exhaustive product that helps an experimenter make most of his experiments. Such sophistication in experimentation was not even thought of in the 20th century and experimentation is evolving with each passing day. In future notes, I will try to expand on each of these elements and add more components to the list as I learn about them.