Aliasing: Your Time Series is Lying to You

Author:Murphy | View: 26077 | Time: 2025-03-23 18:07:30

Time series data is everywhere and full of rich information. Financial markets, industrial processes, sensor readings, health monitors, network traffic, and economic indicators, to name a few, are example applications where time series analysis and signal processing are necessary.

With advances in deep learning and other time series forecasting techniques taking the spotlight, attention has been drawn away from some fundamental properties of time series. Before embarking on any time series project, we must ask ourselves, "Can we trust this data?"

This article will explore a pathological property of discrete time series known as aliasing. Anyone concerned with frequency or seasonality analysis of time series must be keenly aware of aliasing and how it affects their bottom line. We will use the terms "time series" and "signal" interchangeably. Enjoy!

A Motivating Example

To understand what aliasing is and how deceiving it can be, let's begin with a canonical example. We will attempt to answer a question about an elementary oscillating signal. If you're unfamiliar with aliasing, the answer may be shocking.

The Question

Consider the following time series plotted over a one second duration. Each dot represents a sample from a signal, and the lines are linear interpolations through the samples that (supposedly) help us visualize the signal.

An Oscillating Signal Sampled Over One Second. Image by Author.

Moreover, assume the underlying signal from which we are sampling is continuous. This means, at any time t, it is possible to measure the signal's value. Due to computational and memory constraints, we pick a finite number of time points to sample the signal.

The question we need to answer is:

How many peaks does the underlying signal have?

Said differently, what is the frequency at which the signal oscillates? Before reading ahead to the answer, think critically about this question. How would you go about answering it?

Perhaps an intuitive first approach would be to count the number of dots that equal 1. By doing this, you might say the signal has 10 peaks during the second. That is, the signal has a fundamental frequency of 10 hertz (Hz), or 10 repeating occurrences per second.

The Answer

It's impossible to know how many peaks the underlying signal has without more information.

You read that right. If all we have is the data given in this example, it's impossible to know with 100% certainty what the fundamental frequency of the underlying signal is. Lucky for us, we know how this data was generated and what the true frequency content should be.

Believe it or not, the underlying signal has 90 peaks during the second. That is, the fundamental frequency is not 10 Hz, but 90 Hz. This means, in a one second duration, the signal oscillates 90 times. Here's the equation for the signal:

The Equation for the Underlying Signal. Image by Author.

This signal is a pure cosine with a frequency of 90 Hz, and the plot above samples from this signal at discrete time points during the second. If you're like me and still struggling to accept this, consider the Python code that was used to generate the example:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

SAMPLING_RATE = 100 # The sampling rate in Hz
NUM_SECONDS = 1 # The duration of the signal in seconds
FS = 90 # The fundamental frequency of the signal in Hz

# Create the array of time samples
t = np.arange(0, NUM_SECONDS, 1 / SAMPLING_RATE)

# Create the arrat of signal values
signal_values = np.cos(2*np.pi*FS*t)

# Plot the signal
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(t, signal_values)
ax.scatter(t, signal_values)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Signal Value')
ax.set_title('A Signal Sampled Over 1 Second')
ax.legend()
plt.show()

This code plots samples from the pure 90 Hz cosine, exactly as the equation reads. So why do we only see 10 peaks in the original plot? The answer lies in the SAMPLING_RATE parameter.

We are sampling the signal 100 times per second, or at 100 Hz. The time array, t = np.arange(0, NUM_SECONDS, 1 / SAMPLING_RATE) , generates 100 time points from 0 to 1 at which we sample the cosine. The time array looks like this:

>>> NUM_SECONDS = 1
>>> SAMPLING_RATE = 100
>>> t = np.arange(0, NUM_SECONDS, 1 / SAMPLING_RATE)
>>> t
array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
       0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
       0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
       0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
       0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
       0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
       0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
       0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
       0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
       0.99])

There are 100 time samples in the array because we are sampling from the signal at 100 Hz for 1 second. The issue is a 100 Hz sampling rate isn't high enough to capture a signal that oscillates at 90 Hz – this phenomenon is called aliasing. We can see the aliasing with the following plot:

In this plot, we've zoomed in on the first 0.25 seconds of the signal to get a better view of the aliasing. The orange line represents the underlying signal – a 90 Hz cosine function. The blue dots represent samples taken at 100 Hz, or every 0.01 seconds. The samples clearly do a poor job of approximating the signal. Sampling the signal 100 times a second, or 25 times in 0.25 seconds, isn't enough to characterize the 90 Hz cosine.

To capture this higher frequency information, we have to increase the sampling rate of the system:

Increasing the Sampling Rate of the System. Image by Author.

It may not be surprising that increasing the sampling rate results in a better approximation of the signal. However, the implications of this phenomenon are profound. Two critical questions arise from this example:

How frequently do we need to sample a signal to sufficiently approximate its behavior?
Can we detect or prevent aliasing?

We will try to tackle these questions at a high level in this article.

Nyquist-Shannon Sampling Theorem

With an understanding of what aliasing is and the catastrophic effects it can have on time series data, we need to know how often to sample a signal to preserve characteristics of interest. The Nyquist-Shannon Sampling Theorem gives us insight into this.

There are a few variants of the Nyquist-Shannon Sampling Theorem, but the wording we're interested in goes something like this:

If the highest frequency in a continuous signal is F units/time, you can capture all of the information in the signal by sampling at least 2F units/time.

This simple yet powerful theorem gives us everything we need to properly sample from signals that are band-limited. This means Nyquist theorem assumes there are a finite number of frequencies in the underlying signal.

As always, an example will help us get a better grasp of Nyquist theorem and its implications.

Example – A Pure Cosine

Let's return to the 90 Hz cosine sampled at 100 Hz. Here's the code again:

SAMPLING_RATE = 100 # The sampling rate in Hz
NUM_SECONDS = 1 # The duration of the time series
FS = 90 # The fundamental frequency of the signal

# Create the array of time samples
t = np.arange(0, NUM_SECONDS, 1 / SAMPLING_RATE)

# Create the array of signal values
signal_values = np.cos(2*np.pi*FS*t)

# Plot the signal
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(t, signal_values)
ax.scatter(t, signal_values)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Signal Value')
ax.set_title(f'$cos(2 pi {FS}t$) Sampled at {SAMPLING_RATE} Hz')
ax.legend()
plt.show()

And the sampled signal plotted over 1 second:

Next, we will analyze the sampled cosine in the frequency domain by computing its fast Fourier transform (FFT). The FFT gives us a clearer picture of the frequency content in the signal, and it will deepen our understanding of aliasing. Here's the code to compute the FFT of the sampled signal:

from scipy.fft import fft, fftfreq

# Compute the FFT of the sampled signal
power = np.abs(fft(signal_values))
freqs = fftfreq(n=len(power), d=1/SAMPLING_RATE)

# Plot the FFT with a stem plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.stem(freqs, np.abs(power))
ax.set_xlabel('Frequency (Hz)')
ax.set_ylabel('Power')
ax.set_title(f'The FFT of $cos(2 pi {FS}t)$ Sampled at {SAMPLING_RATE} Hz')

And the corresponding plot of the FFT:

The FFT of the Aliased Signal. Image by Author.

This plot shows the power, or strength, contained at each frequency in the signal. In this case, there is only one frequency in the sampled signal, and this is represented by spikes at -10 Hz and 10 Hz. A negative frequency is present because this is a real-valued signal, and the FFT is a complex-valued transformation. For this example, all you need to know is that the negative axis is a mirror image of the positive.

We should make two important observations when looking at this FFT. The first is that all the frequency content concentrates at +/- 10 hz, despite the underlying signal being a 90 Hz cosine. The second observation is that the absolute value of the frequency axis only goes to 50 Hz, which is half of the sampling rate of 100 Hz. From this, we draw the following conclusions:

Because of the implications of Nyquist Theorem, the FFT can only detect frequencies in a sampled signal that are less than half the sampling rate.
If a frequency higher than half the sampling rate is present in the underlying signal, the true frequency will alias to a false lower frequency.

To solidify this idea further, notice what happens to the sampled signal, and it's FFT, as the frequency of the underlying signal increases:

Sampling from Higher Frequency Cosines at 100 Hz. Image by Author.

In this animation, we observe that the FFT can detect increasing frequencies in the signal up to the Nyquist frequency of 50 Hz. Once the underlying frequency exceeds 50 Hz, the signal begins to alias in the opposite direction, and the FFT detects increasingly lower frequencies. In particular, frequencies beyond 50 Hz predictably map back to a frequency between 0 and 500 Hz in the sampled signal.

For this example, if we want to detect the 90 Hz cosine, we need to increase the sampling rate beyond 180 Hz to satisfy Nyquist theorem. Here's an animation showing the effect of increasing the sampling rate on both the sampled signal and its FFT:

Increasing the Sampling Rate. Image by Author.

As the sampling rate increases, the domain of the FFT increases to (+/-) half of the sampling rate (the Nyquist frequency). Once the sampling rate exceeds 180 Hz, the dominant frequency in the FFT stays at (+/-) 90 Hz, and there is no longer any aliasing.

Bonus: You might have noticed the FFT has more than two non-zero values at some samping rates. This is a phenomenon known as spectral leakage, and it happens when the sampled signal is not perfectly periodic. There are ways around this in practice, but this is generally a limitation of sampling.

Practical Implications

After this example, you might still be wondering whether aliasing and Nyquist theorem are relevant considerations in your time series project. While this depends on the application and project requirements, here are some general guidelines:

Sensor data: When collecting data from sensors such as temperature sensors, pressure sensors, or accelerometers, the sampling rate needs to be carefully chosen to avoid aliasing. If the sensor outputs contain high-frequency components, inadequate sampling rates can lead to aliasing.
Audio and music processing: In digital audio processing, aliasing can occur when sampling analog audio signals. If the sampling rate is not high enough to capture the entire frequency range of the audio signal, high-frequency components can fold back into the audible range, resulting in unwanted artifacts and distortion.
Video processing: Video signals also need to be sampled at a sufficiently high rate to avoid aliasing. If the sampling rate is too low, high-frequency components in the video signal can cause aliasing, leading to visual artifacts and degradation in the image quality.
Financial time series analysis: In financial markets, high-frequency trading involves capturing market data at very short time intervals. If the sampling rate is inadequate, aliasing can distort the underlying patterns in the data, leading to incorrect trading decisions and financial analysis. This is one motivation for analyzing financial time series at high frequency resolutions close to the transaction level, rather than aggregating the data into arbitrary time bins.

With any time series project, a major key to success is to know your data and domain well. Does it make sense to aggregate data to a lower frequency, or will this eliminate too much valuable information? Always check your assumptions and make sure you're analyzing your time series at the correct resolution.

Overcoming Aliasing

Now that we understand what aliasing is and when it can occur, we should ask ourselves what we can do to combat it. Years of research in time series analysis and Signal Processing has gone into aliasing, and we won't cover the solutions in depth here. Instead, here are three common ways to deal with aliasing in practice.

Increase the Sampling Rate

The most effective way to combat aliasing is to increase the sampling rate. However, this approach incurs higher computational costs and increased data storage requirements. In short, if you suspect aliasing, and you have the means, increasing the sampling rate is the best option.

Anti-Aliasing Filters

Anti-aliasing filters are crucial components in signal processing systems. These filters are designed to attenuate or eliminate frequency components above the Nyquist frequency, preventing them from aliasing and corrupting the signal. By applying an anti-aliasing filter before sampling or downsampling a signal, we remove unwanted high-frequency components, ensuring a clean and accurate representation of the signal. Butterworth, Chebyshev, and elliptic filters are commonly employed anti-aliasing filter designs, each offering different trade-offs between filter performance and complexity.

Compressed Sensing

Compressed sensing is a relatively new technique that enables the reconstruction of sparse signals using a limited number of non-uniform samples, well below the Nyquist rate. This method capitalizes on the sparsity or compressibility of signals in specific domains, such as wavelets or Fourier analysis, facilitating precise signal recovery while avoiding aliasing effects.

While compressed sensing may not be applicable in all scenarios, its effectiveness is remarkable when it proves suitable. In particular, compressed sensing has demonstrated significant success in addressing image compression challenges due to the inherent compressibility of images.

Final Thoughts

Aliasing is a fundamental limitation of discrete time series that can deceive us and lead to incorrect interpretations of data. This article has provided an intuitive introduction to aliasing using a simple example of an oscillating signal. We learned that aliasing occurs when the sampling rate is insufficient to capture the frequency content of the underlying signal accurately.

The Nyquist-Shannon Sampling Theorem was introduced as a guideline for determining the minimum sampling rate required to avoid aliasing. By sampling at least twice the highest frequency in the signal, we can preserve the information in the original signal.

Practical implications of aliasing were discussed across various domains, such as sensor data, audio and music processing, video processing, and financial Time Series Analysis. In each case, choosing an appropriate sampling rate is crucial to avoid aliasing artifacts and distortion.

To combat aliasing, three common approaches were presented: increasing the sampling rate, using anti-aliasing filters, and employing compressed sensing techniques. Each method has its advantages and considerations, depending on the specific application and resource constraints.

Understanding aliasing and its implications is essential for anyone working with time series data. By being aware of aliasing and employing appropriate techniques, we can ensure accurate and reliable analysis of our data, leading to more informed decision-making and insights.

Become a Member: https://harrisonfhoffman.medium.com/membership

References

https://en.wikipedia.org/wiki/Fundamental_frequency

https://en.wikipedia.org/wiki/Frequency

https://en.wikipedia.org/wiki/Anti-aliasing_filter

https://www.youtube.com/watch?v=SbU1pahbbkc

https://medium.com/r?url=https%3A%2F%2Ftowardsdatascience.com%2Ffinancial-machine-learning-part-0-bars-745897d4e4ba

https://www.youtube.com/watch?v=FcXZ28BX-xE

Tags: Data Science Fourier Transform Python Signal Processing Time Series Analysis