tsfresh

最終更新: 2 分で読めます このページを編集

Saat bekerja dengan data deret waktu, sering kali kita menghitung berbagai fitur berdasarkan kolom timestamp dan angka. Halaman ini akan menunjukkan cara menghitung fitur dari data deret waktu menggunakan tsfresh. Selain itu, video ini menjelaskan berbagai perspektif untuk membuat fitur.

tsfresh #

Overview on extracted featuresを参考に、どんな特徴量が作成されるか確認してみます。

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tsfresh import extract_features

X = []
for id, it in enumerate(np.linspace(0.1, 100, 100)):
    for jt in range(10):
        X.append(
            [
                id,
                jt,
                jt + np.sin(it),
                jt % 2 + np.cos(it),
                jt % 3 + np.tan(it),
                np.log(it + jt),
            ]
        )

X = pd.DataFrame(X)
X.columns = ["id", "time", "fx1", "fx2", "fx3", "fx4"]
X.head()

idtimefx1fx2fx3fx4
0000.0998330.9950040.100335-2.302585
1011.0998331.9950041.1003350.095310
2022.0998330.9950042.1003350.741937
3033.0998331.9950040.1003351.131402
4044.0998330.9950041.1003351.410987
X[X["id"] == 3].plot(subplots=True, sharex=True, figsize=(12, 10))
plt.show()

png

Menghitung Fitur #

Dengan menggunakan fungsi extract_features, Anda dapat menghitung semua fitur sekaligus. Selain itu, Anda juga dapat melakukan seleksi fitur dengan menggunakan fungsi-fungsi yang tersedia di bawah modul tsfresh.feature_selection.

extracted_features = extract_features(X, column_id="id", column_sort="time")
extracted_features.head()
Feature Extraction: 100%|█

fx1__variance_larger_than_standard_deviationfx1__has_duplicate_maxfx1__has_duplicate_minfx1__has_duplicatefx1__sum_valuesfx1__abs_energyfx1__mean_abs_changefx1__mean_changefx1__mean_second_derivative_centralfx1__median...fx4__permutation_entropy__dimension_6__tau_1fx4__permutation_entropy__dimension_7__tau_1fx4__query_similarity_count__query_None__threshold_0.0fx4__matrix_profile__feature_"min"__threshold_0.98fx4__matrix_profile__feature_"max"__threshold_0.98fx4__matrix_profile__feature_"mean"__threshold_0.98fx4__matrix_profile__feature_"median"__threshold_0.98fx4__matrix_profile__feature_"25"__threshold_0.98fx4__matrix_profile__feature_"75"__threshold_0.98fx4__mean_n_absolute_max__number_of_maxima_7
01.00.00.00.045.998334294.0846751.01.0-3.469447e-184.599833...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN1.915905
11.00.00.00.053.952941373.5919821.01.0-6.938894e-185.395294...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN1.918724
21.00.00.00.053.538882369.1411861.01.00.000000e+005.353888...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN2.062001
31.00.00.00.045.143194286.2908001.01.0-8.673617e-194.514319...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN2.186180
41.00.00.00.036.613658216.5559921.01.00.000000e+003.661366...-0.0-0.0NaNNaNNaNNaNNaNNaNNaN2.295964

5 rows × 3156 columns