Take My Drift Away
Model drift refers to changes in the distribution of a model's input or output data over time, which can lead to decreased performance. It is crucial for ML practitioners to monitor and troubleshoot drift to ensure their models stay relevant, especially in businesses where data is constantly evolving. Drift can be categorized into feature drift (change in input distribution) and concept drift (change in output or actuals). To measure drift, one can compare the distributions of inputs, outputs, and actuals between training and production using various distribution distance measures such as Population Stability Index (PSI), Kullback-Leibler divergence (KL divergence), and Wasserstein's Distance. It is essential to relate these metrics to important business KPIs and set up thresholding alerts on the drift in distributions. When a model drifts, retraining it may be necessary, but careful consideration of how to sample new data and represent it in the model is required to avoid overfitting or under-representation. Adjusting the tradeoff between these two competing forces can help strike the right balance. If significant changes have occurred in the business, a simple retrain might not suffice, and the entire model may need revision. Troubleshooting drift involves identifying which input features or outcome variables have changed, understanding how their distributions have shifted, and potentially adjusting the model structure or feature engineering to adapt to new dynamics. Regularly reviewing model performance and maintaining open communication with end-users can help proactively address drift issues and improve models over time.
Company
Arize
Date published
June 21, 2021
Author(s)
Aparna Dhinakaran
Word count
1571
Language
English
Hacker News points
None found.