How to Detect Concept Drift Without Labels

In a previous article, we explored the basics of Concept Drift. Concept drift occurs when the distribution of a dataset changes.
This post continues to explore this topic. Here, you'll learn how to detect concept drift in problems where you don't have access to labels. This task is challenging because without labels we can't evaluate models' performance.
Let's dive in.
Introduction
Datasets that evolve over time are amenable to concept drift. Changes in distributions can undermine models and the accuracy of their predictions. So, it's important to detect and adapt to these changes to keep models up to date.
Most change detection approaches rely on tracking the model's error. The idea is to trigger an alarm when this error increases significantly. Then, some adaptation mechanism kicks in, such as retraining the model.
In the previous article, we argued that having access to labels may be difficult in some cases. Examples appear in many domains, such as fraud detection or credit risk assessment. In the latter, the time it takes for a person to default (and provide a label on their assessment) can take up to several years.
In these cases, you have to detect changes using approaches that do not depend on performance.
Change detection without labels
In general, you have two options to detect changes without labels:
- Track the model's predictions.
- Track the input data (explanatory variables).
In both cases, change is detected when the distribution changes significantly.
How does this work exactly?
Change detection without labels is done by comparing two samples of data. One sample represents the most recent data, also referred as the detection window. The other contains data from the original distribution (reference window).
So, the detection process is split into two parts:
- Building the two samples
- Comparing the two samples using some statistical test
Let's look at each problem in turn.
1. Building the two samples

First, you need to select the data to be tracked.
Tracking the input variables is better for detecting covariate shifts. Label shift and concept drift are better detected by tracking models' predictions. These predictions can be either the predicted probabilities or the estimated class.
Then, you are ready to build the two data samples.
One sample is the reference window. It represents the original distribution on which the model was built. The other sample is the detection window. This sample contains information about the latest instances.
Sliding vs Fixed Reference Window
The reference window can be either fixed or sliding.
- The data in a fixed reference window remain the same over time. These can be the training cases, for example.
- In a sliding reference window, the observations are the subset of data that arrives right before the data in the detection window.
Here's a visual description of these windows:

Which type of window should be used depends on your goal. A sliding reference window will be sensitive to drastic and abrupt changes.
A fixed reference window may be more appropriate to capture gradual changes. It also helps to quantify how different the current data is from the instances used for training the model. Gradual changes may not be clear if the reference and detection windows are adjacent.
Data windows vs Temporal windows
Besides choosing between a fixed or sliding reference window, you can also choose between data versus temporal window.
Data windows contain a fixed number of observations (say, 1000 instances). Temporal windows contain an arbitrary number of samples in a fixed period. For example, all data points collected in the past 24 hours (for the detection window).
Whichever type of windows you use, it's important to consider their size.
If the windows are too small, this may lead to too many false alarms. This results in increased unnecessary costs due to updating the model. Yet, if the windows are too large, it may take too long to detect changes.
2. Comparing the two samples using a statistical test
Now you have two samples of data to compare.
If change has not occurred, you expect these to follow the same distribution.
The comparison can be done using two-sample statistical tests. The Kolmogorov-Smirnov test is a common choice. Kolmogorov-Smirnov tests whether two samples follow the same distribution with a given significance level.
The Kolmogorov-Smirnov test, and similar ones, assume that their input data (the two windows) are univariate. If the samples are multivariate, you need to adapt your approach. Either transform the data to a univariate vector or apply the test to each variable and combine the results.
Hands-on: Example using scikit-multiflow
Let's do a quick tutorial using Python.
Training stage
First, let's create a dataset and train a model.
from skmultiflow.data.sea_generator import SEAGenerator
from skmultiflow.meta import AdaptiveRandomForestClassifier
# creating a data stream
stream = SEAGenerator(classification_function=2,
random_state=112,
balance_classes=False,
noise_percentage=0.28)
# getting the next 1000 observations
X_train, y_train = stream.next_sample(1000)
# building a classifier using the adaptive random forest
model = AdaptiveRandomForestClassifier()
model.fit(X_train, y_train)
In the preceding code, we use the SEAGenerator class to create an artificial data stream. Then, we get 1000 observations from it and train an AdaptiveRandomForestClassifier model. This method is an extension of the Random Forest algorithm, but tailored for data streams tasks.
Change detection stage
Now, we start the change detection process. Here's the setup we'll use in this example:
- Tracking the input data. For simplicity, we'll focus on the first variable only.
- Fixed reference window. We set training instances as the reference window
- Kolmogorov-Smirnov test to detect changes
Let's start be defining the reference window:
# setting up a fixed reference window based on the training data
fixed_reference_x1 = X_train[:,0]
Then, we process the data stream and check for changes. Check the comments for additional context.
from scipy.stats import ks_2samp
# buffer for the first variable
buffer_x1 = []
# processing each instance of the data stream
while stream.has_more_samples():
# getting a new sample from the data stream
X_i, y_i = stream.next_sample()
# making the prediction
pred = model.predict(X_i)
# monitoring the variable x1
## adding x1 value to buffer
buffer_x1.append(X_i[0][0])
### waiting until the buffer has at least 1000 observations
if len(buffer_x1) < len(fixed_reference_x1):
continue
## getting the detection window (latest 1000 records)
detection_window = buffer_x1[-1000:]
## using KS test
test = ks_2samp(fixed_reference_x1, detection_window)
## checking if change occurs (pvalue is below 0.001)
change_detected = test.pvalue < 0.001
if change_detected:
print('Update model')
For each new instance, we make a prediction using the trained model. We also add the value of the first variable to a buffer. This buffer represents the detection window and contains the latest 1000 values.
Then, we use the Kolmogorov-Smirnov test to compare the two samples. If the p-value is below the significance level, we consider that the distribution has changed.
Takeaways
- In some domains, labels are hard to get during the production stage of models. So, change detection often needs to be done without this information.
- Unsupervised change detection can be done by tracking the input variables or model's predictions.
- The tracking process is done with two windows: a reference window, which represents the distribution the model was built on; and a detection window with the latest observations.
- The two windows are compared with a statistical test such as Kolmogorov-Smirnov. If the test's p-value is below a significance threshold, you should consider updating the model.
Thank you for reading, and see you in the next story!
References
[1] Cerqueira, V., Gomes, H. M., Bifet, A., & Torgo, L. (2023). STUDD: A student–teacher method for unsupervised concept drift detection. Machine Learning, 112(11), 4351–4378.
[2] dos Reis, Denis Moreira, et al. "Fast unsupervised online drift detection using incremental kolmogorov-smirnov test." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.
[3] Pinto, Fábio, Marco OP Sampaio, and Pedro Bizarro. "Automatic model monitoring for data streams." arXiv preprint arXiv:1908.04240 (2019).