Trend & Periodicity

Time-series data are sometimes considered to be composed of seasonality, trend components, and residuals, and there are times when we want to decompose time-series data into these three components. STL decomposition decomposes data into seasonality, trend components, and residuals based on the specified period length. This method is effective when the length of the period of the seasonal component (waves that occur repeatedly) is constant and the length of the period is known in advance.

import japanize_matplotlib
import matplotlib.pyplot as plt
import numpy as np

from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.seasonal import STL

Sample data created

The data is a combination of several periodic values, and the trend changes in a segmented manner. Noise is also added by np.random.rand().

date_list = pd.date_range("2021-01-01", periods=365, freq="D")
value_list = [
    10
    + i % 14
    + np.log(1 + i % 28 + np.random.rand())
    + np.sqrt(1 + i % 7 + np.random.rand()) * 2
    + (((i - 100) / 10)) * (i > 100)
    - ((i - 200) / 7) * (i > 200)
    + np.random.rand()
    for i, di in enumerate(date_list)
]

df = pd.DataFrame(
    {
        "日付": date_list,
        "観測値": value_list,
    }
)

df.head(10)

日付観測値
02021-01-0113.874478
12021-01-0215.337944
22021-01-0317.166133
32021-01-0419.561274
42021-01-0520.426502
52021-01-0622.127725
62021-01-0724.484329
72021-01-0822.313675
82021-01-0923.716303
92021-01-1025.861587
plt.figure(figsize=(10, 5))
sns.lineplot(x=df["日付"], y=df["観測値"])
plt.grid(axis="x")
plt.show()

png

Decompose into trend, periodicity, and residuals

Time series data using statsmodels.tsa.seasonal.STL

  • Trend (.trend)
  • Seasonal/periodic(.seasonal)
  • Residual(.resid)

(In the following code, dr means DecomposeResult.

Cleveland, Robert B., et al. “STL: A seasonal-trend decomposition.” J. Off. Stat 6.1 (1990): 3-73. (pdf)

period = 28
stl = STL(df["観測値"], period=period)
dr = stl.fit()
_, axes = plt.subplots(figsize=(12, 8), ncols=1, nrows=4, sharex=True)

axes[0].set_title("観測値")
axes[0].plot(dr.observed)
axes[0].grid()

axes[1].set_title("トレンド")
axes[1].plot(dr.trend)
axes[1].grid()

axes[2].set_title("季節性")
axes[2].plot(dr.seasonal)
axes[2].grid()

axes[3].set_title("その他の要因・残差")
axes[3].plot(dr.resid)
axes[3].grid()

plt.tight_layout()
plt.show()

png

Check for resistance to outliers

def check_outlier():
    period = 28
    df_outlier = df["観測値"].copy()

    for i in range(1, 6):
        df_outlier[i * 50] = df_outlier[i * 50] * 1.4

    stl = STL(df_outlier, period=period, trend=31)
    dr_outlier = stl.fit()

    _, axes = plt.subplots(figsize=(12, 8), ncols=1, nrows=4, sharex=True)

    axes[0].set_title("観測値")
    axes[0].plot(dr_outlier.observed)
    axes[0].grid()

    axes[1].set_title("トレンド")
    axes[1].plot(dr_outlier.trend)
    axes[1].grid()

    axes[2].set_title("季節性")
    axes[2].plot(dr_outlier.seasonal)
    axes[2].grid()

    axes[3].set_title("その他の要因・残差")
    axes[3].plot(dr_outlier.resid)
    axes[3].grid()


check_outlier()

png

Check to see if the trend is extracted

If the trend is correctly extracted, the trend should not contain a cyclical component and the PACF (Partial Autocorrelation) should be close to zero.

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

_, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))
plt.suptitle("トレンドの自己相関・偏自己相関")
plot_acf(dr.trend.dropna(), ax=axes[0])
plot_pacf(dr.trend.dropna(), method="ywm", ax=axes[1])
plt.tight_layout()
plt.show()

_, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))
plt.suptitle("残差の自己相関・偏自己相関")
plot_acf(dr.resid.dropna(), ax=axes[0])
plot_pacf(dr.resid.dropna(), method="ywm", ax=axes[1])
plt.tight_layout()
plt.show()

png

png

Portmanteau test

from statsmodels.stats.diagnostic import acorr_ljungbox

acorr_ljungbox(dr.resid.dropna(), lags=int(np.log(df.shape[0])))

lb_statlb_pvalue
13.9622970.046530
24.8235140.089658
38.4567530.037457
412.0928930.016674
523.1344670.000318

Comments

(Comments will appear after approval)