Detrending Seasonal Data

Detrending Seasonal Data

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

statsmodels is a comprehensive library for time series data analysis. And it has a really neat set of functions to detrend data. So if you see that your features have any trends that are time-dependent, then give this a try.

It’s essentially fitting the multiplicative model:

$y(t) = Level * Trend * Seasonality * Noise$

Below we have some data from the 1950’s showing the number of people (monthly, in thousands) flying with an airline. You can see that there is clearly some seasonal variation.

from pandas import Series
import matplotlib.pyplot as plt

series = Series.from_csv('https://s3.eu-west-2.amazonaws.com/assets.winderresearch.com/data/international-airline-passengers.csv', header=0)
series.plot()
plt.show()
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py:2849: FutureWarning: from_csv is deprecated. Please use read_csv(...) instead. Note that some of the default arguments are different, so please refer to the documentation for from_csv when changing your function calls
  infer_datetime_format=infer_datetime_format)
png
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(series, model='multiplicative')
result.plot()
plt.show()
/opt/conda/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools
png

Note how well it de-seasonal-ises the data. After removing the seasonal variation the trend is quite consistent.

More articles

Evidence, Probabilities and Naive Bayes

Probabilistic models are great at promoting good science. I.e. we're trying to model features to predict outputs. In this Python Notebook you will learn how to calculate bayes rule and use a naive bayes classifier.

Read more

Hierarchical Clustering - Agglomerative

Often data is produced by a process that has some natural hierarchy. If you have a clustering problem where this is true, hierarchical clustering works really well. Find out more in this Python Notebook.

Read more
}