Data Drift

A Simple Explanation - By Varsha Saini

A machine learning model which is producing accurate results today may not perform well after a few months. There can be multiple reasons behind this. One of the reasons is the difference in data used while training and the data model receive for prediction. The degradation in the quality of model predictive power over time is called Model Drift.

Types of Model Drift

There are two types of Model Drift:

  1. Data Drift
  2. Concept Drift

1. Data Drift

Data Drift is a type of model drift in which the data distribution changes over time. Hence the model which was created using the old data is no longer valid for prediction on new data.

2. Concept Drift

Concept Drift is a type of model drift in which the properties of independent variables changes over time. The machine learning model used to create the relationship between independent variables and the dependent variable is no longer valid.

Factors that can Cause Data to Drift

  1. Error in Data Collection: There may be some mistake in labeling the new data model received for prediction.
  2. Time Gap: There may be a huge gap between the time when a model was trained and when predictions are made.
  3. Seasonality: The data may have changed due to its seasonal nature.
  4. Data Distribution: There may be a data shift or statistical changes in the feature values.
  5. Addition of New Value: There may be the addition of a new category for some feature that was not available at the time of training.