Vamos a realizar detección de anomalías utilizando Anomaly Detection Toolkit (ADTK). Aplicaremos la detección de anomalías a datos artificiales multidimensionales. Esta vez, trabajaremos con datos de múltiples dimensiones.
import numpy as np
import pandas as pd
from adtk.data import validate_series
s_train = pd.read_csv("./training.csv", index_col="timestamp", parse_dates=True)
s_train = validate_series(s_train)
s_train["value2"] = s_train["value"].apply(lambda v: np.sin(v) + np.cos(v))
s_train
value | value2 | |
---|---|---|
timestamp | ||
2014-04-01 00:00:00 | 18.090486 | 0.037230 |
2014-04-01 00:05:00 | 20.359843 | 1.058643 |
2014-04-01 00:10:00 | 21.105470 | 0.141581 |
2014-04-01 00:15:00 | 21.151585 | 0.076564 |
2014-04-01 00:20:00 | 18.137141 | 0.103122 |
... | ... | ... |
2014-04-14 23:35:00 | 18.269290 | 0.288071 |
2014-04-14 23:40:00 | 19.087351 | 1.207420 |
2014-04-14 23:45:00 | 19.594689 | 1.413067 |
2014-04-14 23:50:00 | 19.767817 | 1.401750 |
2014-04-14 23:55:00 | 20.479156 | 0.939501 |
4032 rows × 2 columns
from adtk.visualization import plot
plot(s_train)
Realizaremos la detección de anomalías utilizando SeasonalAD. Para otros métodos, consulte Detector.
import matplotlib.pyplot as plt
from adtk.detector import OutlierDetector, PcaAD, RegressionAD
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import LocalOutlierFactor
model_dict = {
"OutlierDetector": OutlierDetector(LocalOutlierFactor(contamination=0.05)),
"RegressionAD": RegressionAD(regressor=LinearRegression(), target="value2", c=3.0),
"PcaAD": PcaAD(k=2),
}
for model_name, model in model_dict.items():
anomalies = model.fit_detect(s_train)
plot(
s_train,
anomaly=anomalies,
ts_linewidth=1,
ts_markersize=3,
anomaly_color="red",
anomaly_alpha=0.3,
curve_group="all",
)
plt.title(model_name)
plt.show()