Detrending Seasonal Data

A quick Python Notebook to show you how to use statsmodels to detrend seasonal data.

MACHINE LEARNING
WORKSHOP

Detrending Seasonal Data

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

statsmodels is a comprehensive library for time series data analysis. And it has a really neat set of functions to detrend data. So if you see that your features have any trends that are time-dependent, then give this a try.

It’s essentially fitting the multiplicative model:

$y(t) = Level * Trend * Seasonality * Noise$

Below we have some data from the 1950’s showing the number of people (monthly, in thousands) flying with an airline. You can see that there is clearly some seasonal variation.

from pandas import Series
import matplotlib.pyplot as plt

series = Series.from_csv('https://s3.eu-west-2.amazonaws.com/assets.winderresearch.com/data/international-airline-passengers.csv', header=0)
series.plot()
plt.show()
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py:2849: FutureWarning: from_csv is deprecated. Please use read_csv(...) instead. Note that some of the default arguments are different, so please refer to the documentation for from_csv when changing your function calls
  infer_datetime_format=infer_datetime_format)

png

from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(series, model='multiplicative')
result.plot()
plt.show()
/opt/conda/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

png

Note how well it de-seasonal-ises the data. After removing the seasonal variation the trend is quite consistent.