Moving Average takes the average of a range of values and the range is updated continuously. It is called Moving as the range of values is relocated continuously and Average as it takes the average (mean) of a range of values.
The core assumption of the moving average is that data is stationary and has a slowly varying mean. If the data is not stationary, we first need to convert it into stationary before applying the moving average.
Moving Average Smoothening
Smoothening in time series is a technique of removing all the variations between time steps. The moving average is one of the most commonly used smoothening processes in time series. It creates a new series from the raw series in which every new value is the average calculated from the last n values. The value n is called the window size or window width.
Types of Moving Average
There are various types of moving averages:
1. Simple Moving Average
It is simply the average of n numbers that is moving over a specified period.
where t = value at current time and n= number of data points (window size)
2. Weighted Moving Average
It is the weighted average of n numbers in which the recent values are given more weightage.
where w= weights
3. Exponential Moving Average
It is another type of moving average in which recent values are given more weightage and significance.
- There is no need to decide on weights manually.
- It adopts more quickly to the data points changes.
4. Exponential Smoothening
This type has an extra smoothening factor α which can be used to control the weights of values.
- The value of α near 1 means it gives more weightage to closer data points.
- The value of α near 0 means it gives more weightage to extreme data points.
where α = smoothening parameter
Applications of Moving Average
- Moving Average can be used to create a smoothed version of the original dataset.
- It is used across various analytics domains.
- In streaming analytics, a lot of window functions come from the moving averages.
Let’s code and see how these moving averages are applied to time series data. We will also compare the results.
1. Load Dataset
The data is taken from Kaggle Electric_Production.
import pandas as pd data=pd.read_csv("Electric_Production.csv") data.head()
2. Make Date as Index
For working with any time series data, the first step is to make the date column as index.
data["DATE"]=pd.to_datetime(data["DATE"]) data.set_index('DATE',inplace=True) data.head()
3. Simple Moving Average
The moving average is calculated using the past 3 values. Therefore, the ist 3 values are NaN since they don’t have past 3 values.
4. Weighted Moving Average
Let us calculate the weighted moving average to find its ability to capture the fluctuations of the original series. Similar to the moving average, we will use the past 3 data to calculate the weighted moving average.
def wma(weights): def calc(x): return (weights*x).mean() return calc
import numpy as np data["wma_rolling_3"]=data['IPG2211A2N'].rolling(window=3).apply(wma(np.array([0.5,1,1.5]))).shift(1) data[data.index.year>2010].plot()
For better clarity, we have plotted data after 2010. the blue line represents the original data, the orange represents the moving average and the green represents the weighted moving average.
It is interpreted from the graph, that a weighted moving average is finding trends sooner than a simple moving average. But the drawback is it is more complex.
5. Exponential Moving Average
Similar to the weighted moving average, data is plotted after 2011 to better capture the fluctuations. The graph shows that sometimes the exponential moving average outperforms the weighted average and the other way round.
6. Exponential Smoothening Average
7. Root Mean Squared Error
We have compared different moving averages using the graph. Now, we will use Root Mean Squared Error to compare the values.
From the above calculation, we can see that error is least in exponentially smoothening moving average and highest in the simple moving average. This is the same as what we saw in the graphs as well.