Data Drift

A Simple Explanation - By Varsha Saini

A machine learning model which is producing accurate results today may not perform well after a few months. There can be multiple reasons behind this. One of the reasons is the difference in data used while training and the data model receive for prediction. The degradation in the quality of model predictive power over time is called Model Drift.

Types of Model Drift

There are two types of Model Drift:

Data Drift
Concept Drift

1. Data Drift

Data Drift is a type of model drift in which the data distribution changes over time. Hence the model which was created using the old data is no longer valid for prediction on new data.

2. Concept Drift

Concept Drift is a type of model drift in which the properties of independent variables changes over time. The machine learning model used to create the relationship between independent variables and the dependent variable is no longer valid.

Factors that can Cause Data to Drift

Error in Data Collection: There may be some mistake in labeling the new data model received for prediction.
Time Gap: There may be a huge gap between the time when a model was trained and when predictions are made.
Seasonality: The data may have changed due to its seasonal nature.
Data Distribution: There may be a data shift or statistical changes in the feature values.
Addition of New Value: There may be the addition of a new category for some feature that was not available at the time of training.

Varsha Saini

Data Drift

A Simple Explanation - By Varsha Saini

Types of Model Drift

1. Data Drift

2. Concept Drift

Factors that can Cause Data to Drift

Other Popular Terms

Adjusted R-Squared

Autocorrelation

Bagging Algorithm

Bessel’s Correction

Boosting Algorithm

CatBoost

Citizen Data Scientist

Cohen Kappa

Confusion Matrix

Correlation

Cross Validation

Data Drift

Data Imputation

Differential Privacy

Elastic Net Regression

Evaluation Metrics

Feature Selection

Genetic Programming