Anomaly Detection Toolkit (ADTK) digunakan untuk melakukan deteksi anomali. Deteksi anomali diterapkan pada data buatan multidimensi. Kali ini, deteksi anomali akan diterapkan pada data dengan beberapa dimensi.
import numpy as np
import pandas as pd
from adtk.data import validate_series
s_train = pd.read_csv("./training.csv", index_col="timestamp", parse_dates=True)
s_train = validate_series(s_train)
s_train["value2"] = s_train["value"].apply(lambda v: np.sin(v) + np.cos(v))
s_train
value | value2 | |
---|---|---|
timestamp | ||
2014-04-01 00:00:00 | 18.090486 | 0.037230 |
2014-04-01 00:05:00 | 20.359843 | 1.058643 |
2014-04-01 00:10:00 | 21.105470 | 0.141581 |
2014-04-01 00:15:00 | 21.151585 | 0.076564 |
2014-04-01 00:20:00 | 18.137141 | 0.103122 |
... | ... | ... |
2014-04-14 23:35:00 | 18.269290 | 0.288071 |
2014-04-14 23:40:00 | 19.087351 | 1.207420 |
2014-04-14 23:45:00 | 19.594689 | 1.413067 |
2014-04-14 23:50:00 | 19.767817 | 1.401750 |
2014-04-14 23:55:00 | 20.479156 | 0.939501 |
4032 rows × 2 columns
from adtk.visualization import plot
plot(s_train)
Deteksi anomali dilakukan menggunakan SeasonalAD. Metode lainnya dapat dilihat di Detector.
import matplotlib.pyplot as plt
from adtk.detector import OutlierDetector, PcaAD, RegressionAD
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import LocalOutlierFactor
model_dict = {
"OutlierDetector": OutlierDetector(LocalOutlierFactor(contamination=0.05)),
"RegressionAD": RegressionAD(regressor=LinearRegression(), target="value2", c=3.0),
"PcaAD": PcaAD(k=2),
}
for model_name, model in model_dict.items():
anomalies = model.fit_detect(s_train)
plot(
s_train,
anomaly=anomalies,
ts_linewidth=1,
ts_markersize=3,
anomaly_color="red",
anomaly_alpha=0.3,
curve_group="all",
)
plt.title(model_name)
plt.show()