How To Measure The Memory Of Time Series Data
What is memory. Why do we need to measure it. How to measure it.
Hey reader, welcome to the 💥 free edition 💥 of my weekly newsletter. I write about ML concepts, how to build ML products, and how to thrive at work. You can learn more about me here. Feel free to send me your questions and I’m happy to offer my thoughts. Subscribe to this newsletter to receive it in your inbox every week.
Time series data is a series of observations obtained over equally spaced time intervals. It appears in many real world situations such as stock prices, heartbeat data, sensor data, weather forecasting, and more. People usually look at the past values of time series data to forecast its future values. It's easier to predict the future values of certain types of data as compared to others. For example, it's easier to predict the future values of a sine wave than it is to predict the future values of a random signal. Why is that?
Because different time series datasets have different memories
In time series data, "memory" refers to how strongly the past values can influence the future values. If it has a strong memory, then the past is very indicative of the future. In our example, we can say that the sine wave has strong memory.
If a given time series signal has weak memory, then the past doesn't tell us much about the future. In our example, the random signal has weak memory. But can we measure the memory? How much weaker is the random signal's memory compared to the sine wave's memory?
Why do we need to measure memory?
Before we dive in, we need to quickly talk about long and short memory processes. It refers to how correlated different points are on the timeline. Let's consider two points on a timeline — A and B. Point A is fixed and point B can be moved around. Let's say we observe that if B is close to A, they tend to be correlated. It remains true for various values of A. But as we increase the distance between A and B, will they still remain correlated? If so, how long will they continue to stay correlated?
This is referred to as the "memory" of the data. As we increase the distance between two points, how strongly will they continue to be correlated? If they continue to stay correlated, we say that it's a long memory process. If not, we say that it's a short memory process.
Why is this important?
Because it has a huge impact on the predictability of time series data. If you want to perform predictions on time series data, you first need to estimate its memory to see if it lends itself to this type of analysis. Or else you will end up with random results.
In short memory processes, the past doesn't impact the future that much. It's difficult to perform predictions on such type of data. On the other hand, long memory processes lend themselves to this type of analysis. In these processes, the past values strongly influence the present and the future.
We'll focus our discussion on long memory processes. How do we estimate the memory of a long term process? This is where Hurst exponent comes into the picture.
What is Hurst exponent?
For a given time series variable, the Hurst exponent is a measure of its long term memory. It tells us how strongly the given time series data will regress to its mean. The value of the Hurst exponent can range between 0 to 1.
If the value is 0.5, then it indicates that there is no correlation between the values in the data. It's just random.
If the value is above 0.5, it indicates that the time series data is persistent. What does that mean? It means that if the values are increasing right now, then it is likely that they will keep increasing in the short term. If the values are decreasing right now, then it is likely that they will keep decreasing in the short term.
If the value is less than 0.5, it indicates that the time series data is anti-persistent. It means that if the values are increasing right now, then it is likely that they decrease in the short term. If the values are decreasing right now, then it is likely that they will increase in the short term.
In essence, the time series data becomes more "predictable" as the Hurst exponent value moves away from 0.5.
What about stationarity?
A stationary process is a stochastic process whose properties don’t change when shifted in time. Estimating the stationarity of a process gives us an idea about the predictability of time series data. Isn't Hurst exponent also designed to give us an idea about predictability? What's the difference here?
Both methods are designed to give us an idea of whether the given time series data lends itself to predictive analysis. The key difference is in the assumptions we need to make about the data in question.
For time series analysis techniques, we need to make assumptions about stationarity of the data before we extract its properties. This is restrictive because we don’t really know about it beforehand. The assumption could easily be wrong. But when we use Hurst exponent, we can extract the properties of a time series variable without making any assumptions about its stationarity.
Where to go from here?
In this post, we talked about the concept of memory for time series data. We discussed what it means, how to measure it, what is Hurst component, and where stationarity fits in. Time series data appears a lot in the real world. Many machine learning models are being built to make predictions and forecasts. Measuring the memory of a given time series variable helps us understand if the given data is inherently not suitable for forecasting or if we can do better by refining the model further.
🎙 Infinite ML podcast
This podcast features conversations with amazing builders and practitioners in machine learning. It now has listeners in 250 cities across 50 countries.
You can subscribe to the podcast below:
🎧 Apple Podcasts
🎧 Spotify
🎧 Google Podcasts
🎧 Web
🔍 ML Job Opportunities
Check out this job board for the open roles in Machine Learning.
💁🏻♀️ 💁🏻♂️ How would you rate this week’s newsletter?
You can rate this newsletter to let me know what you think. Your feedback will help make it better.