Evaluating search engines and other online systems can be complex due to their constantly changing nature. A/B testing metrics play a crucial role in assessing the effectiveness of these systems. This article focuses on qualitative evaluation, specifically relevance, which is essential for measuring search engine quality. Relevance involves finding matching records and ordering them based on factors such as popularity, ratings, UI/UX, personalization, and merchandizing.
Various business roles have different notions of what constitutes good relevance, making it challenging to define A/B testing metrics that accurately reflect measurable relevance. Directionality and sensitivity are two important aspects to consider when choosing a metric. Directionality ensures that the metric has a clear interpretation, while sensitivity allows for detecting small changes in search experience.
Commonly used A/B testing metrics include increased revenue, conversion rate (CR), and click-through rate (CTR). While each of these metrics has its pros and cons, combining them can provide more reliable results. It is crucial to align the chosen metric with the specific use case and business objectives. Additionally, avoiding gaming by not relying on a single type of metric is essential for pragmatically evaluating an online system's performance during A/B testing.