This case study outlines how we utilized Explainable AI (XAI) techniques to monitor model performance and detect concept and data drift for our client DAT Group. By implementing a comprehensive and future-proof approach, we empowered our client to maintain optimal model performance and ensure data integrity, resulting in cost savings and increased trust in their models.
Key Challenges
Our client is DAT Group, an international company operating as a trust in the automotive industry. For over 90 years, they provide data products and services in the automotive sector that focus on enabling a digital vehicle lifecycle.
One of their key products is the provision of price estimates for used cars. This is used by various customers, from insurance companies to original equipment manufacturers. For the price estimates, they leveraged both domain expertise and market data. The workflows for processing and analyzing data were primarily manual, which made it impossible to scale, accelerate, and automate the information retrieval process.
As part of the AI roadmap we supported DAT with, we automated these manual data processes and developed a machine learning solution that allowed for data-driven estimations of used car prices. These solutions enabled the team to take real-time data-driven decisions.
As time passed, our client’s teams faced several challenges related to their machine learning model performance, including:
- Degradation of model performance over time due to changing data patterns and market conditions ((concept drift and data drift).
- Difficulty in orchestrating and determining the right time to trigger model re-training in order to maintain optimal performance.
- Ensuring data integrity for new incoming data, preventing the introduction of noise and biases into the model.
- Manual monitoring of the impact of data drift on model performance.
Our Approach to Model Monitoring using XAI
To address these challenges, we took a comprehensive and future-proof approach comprising the following steps:
Implementation of Automated Data Drift Detection using SHAP
We leveraged the SHAP (SHapley Additive exPlanations) library to continuously evaluate and track the SHAP values for every new incoming data point. In general, SHAP values provide insights into the contribution of individual features to the prediction of a model for a single data-point. Any changes in the distribution of SHAP values indicate that the statistical pattern in the new data may has shifted over time, such that the model’s assumptions about the data are no longer accurate. This phenomenon is commonly referred to as concept drift. By monitoring the SHAP values we can precisely detect when such concept drift occurs and take appropriate measures.
Continuous Visualization in a Model Monitoring Dashboard
We developed a dynamic dashboard to visualize model performance using on the one hand standard evaluation metrics such as RMSE and MAE and on the other hand SHAP values for each feature. This allowed the client to easily monitor their models, identify any performance issues, and understand how data drift was affecting the model’s accuracy.
Automated Notification for Detected Data Drift
We set up an automated email notification system to alert the Product Owner and Data Scientists either when model performance is degrading or when concept drift is detected. This ensured that the relevant stakeholders were promptly informed, and could take appropriate actions, such as adjusting the model’s parameters or initiating retraining, depending on the severity of the drift.
Thorough Instruction on Model Retraining
We provided in-depth training to the Product Owner and Data Scientists on how to retrain their models when necessary. This guidance covered various aspects, including identifying the need for retraining, selecting the appropriate training data, validating the new model’s performance, and deploying the updated model in production. This enabled them to maintain optimal model performance and make better decisions on when to trigger retraining.
Benefits
By implementing this comprehensive approach of Model Moniroting using XAI, our client experienced several benefits, including:
- Prevention of outdated models in production, ensuring that their models continued to provide accurate predictions as data patterns evolved.
- Improved model performance over time, as the system was able to adapt to changing data patterns and maintain a high level of accuracy.
- Increased trust in their models due to higher visibility of performance metrics, enabling stakeholders to make more informed decisions based on the model’s predictions.
- Cost savings by triggering retraining only when required, avoiding unnecessary retraining efforts and reducing the overall maintenance costs.
- Greater control over the quality of their models, allowing them to fine-tune model parameters and ensure consistent performance.
Team Involved
One Data Scientist and XAI Engineer collaborated with our client for 4 months on this project. They worked closely with the client’s data science team to design and implement the monitoring solution, provided training on model retraining, and supported the client throughout the process of integrating the solution into their existing infrastructure.