tsfresh

最終更新: 2 分で読めます このページを編集

When working with time series data, various features can be calculated based on timestamps and numerical values. This page demonstrates how to calculate features from time series data using tsfresh. Additionally, the accompanying video explains the perspectives from which features can be created.

tsfresh #

Overview on extracted featuresを参考に、どんな特徴量が作成されるか確認してみます。

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tsfresh import extract_features

X = []
for id, it in enumerate(np.linspace(0.1, 100, 100)):
    for jt in range(10):
        X.append(
            [
                id,
                jt,
                jt + np.sin(it),
                jt % 2 + np.cos(it),
                jt % 3 + np.tan(it),
                np.log(it + jt),
            ]
        )

X = pd.DataFrame(X)
X.columns = ["id", "time", "fx1", "fx2", "fx3", "fx4"]
X.head()

idtimefx1fx2fx3fx4
0000.0998330.9950040.100335-2.302585
1011.0998331.9950041.1003350.095310
2022.0998330.9950042.1003350.741937
3033.0998331.9950040.1003351.131402
4044.0998330.9950041.1003351.410987
X[X["id"] == 3].plot(subplots=True, sharex=True, figsize=(12, 10))
plt.show()

png

Calculating Features #

You can calculate all features at once using the extract_features function. Additionally, you can perform feature selection using functions available under tsfresh.feature_selection.

extracted_features = extract_features(X, column_id="id", column_sort="time")
extracted_features.head()
Feature Extraction: 100%|█

fx1__variance_larger_than_standard_deviationfx1__has_duplicate_maxfx1__has_duplicate_minfx1__has_duplicatefx1__sum_valuesfx1__abs_energyfx1__mean_abs_changefx1__mean_changefx1__mean_second_derivative_centralfx1__median...fx4__permutation_entropy__dimension_6__tau_1fx4__permutation_entropy__dimension_7__tau_1fx4__query_similarity_count__query_None__threshold_0.0fx4__matrix_profile__feature_"min"__threshold_0.98fx4__matrix_profile__feature_"max"__threshold_0.98fx4__matrix_profile__feature_"mean"__threshold_0.98fx4__matrix_profile__feature_"median"__threshold_0.98fx4__matrix_profile__feature_"25"__threshold_0.98fx4__matrix_profile__feature_"75"__threshold_0.98fx4__mean_n_absolute_max__number_of_maxima_7
01.00.00.00.045.998334294.0846751.01.0-3.469447e-184.599833...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN1.915905
11.00.00.00.053.952941373.5919821.01.0-6.938894e-185.395294...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN1.918724
21.00.00.00.053.538882369.1411861.01.00.000000e+005.353888...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN2.062001
31.00.00.00.045.143194286.2908001.01.0-8.673617e-194.514319...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN2.186180
41.00.00.00.036.613658216.5559921.01.00.000000e+003.661366...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN2.295964

5 rows × 3156 columns