Both visible and invisible changes can be tested with A/B testing.
A popular example is Amazon’s A/B test that showed every 100ms increase in page load time decreased the sales by 1%.
New experiences are not suitable for implementing A/B tests.
Because a new experience can show change aversion behavior where users don’t like changes and prefer to stick to the old version, or it can show novelty effect where users feel very excited and want to test out everything.
In both cases, defining a baseline for comparison and deciding the duration of the test is difficult.
Metric selection needs to consider both sensitivity and robustness.
Sensitivity means that metrics should be able to catch the changes and robustness means that metrics shouldn’t change too much from irrelevant effects.
As an example, most of the time if the metric is a “mean”, it is sensitive to outliers but not robust. If the metric is a “median”, it is robust but not sensitive for small group changes.
In order to consider both sensitivity and robustness in the metric selection, we can apply filtering and segmentation while creating the control and experiment samples.
Filtering and segmentation can be based on user demographics (i.e. age, gender), the language of the platform, internet browser, device type (i.e. iOS or Android), cohort and etc.