Let’s try anomaly detection using the Anomaly Detection Toolkit (ADTK).
We will apply anomaly detection to multidimensional synthetic data. This time, we will work with data across multiple dimensions.
import numpy as np
import pandas as pd
from adtk.data import validate_series
s_train = pd.read_csv("./training.csv", index_col="timestamp", parse_dates=True)
s_train = validate_series(s_train)
s_train["value2"] = s_train["value"].apply(lambda v: np.sin(v) + np.cos(v))
s_train
value | value2 | |
---|---|---|
timestamp | ||
2014-04-01 00:00:00 | 18.090486 | 0.037230 |
2014-04-01 00:05:00 | 20.359843 | 1.058643 |
2014-04-01 00:10:00 | 21.105470 | 0.141581 |
2014-04-01 00:15:00 | 21.151585 | 0.076564 |
2014-04-01 00:20:00 | 18.137141 | 0.103122 |
... | ... | ... |
2014-04-14 23:35:00 | 18.269290 | 0.288071 |
2014-04-14 23:40:00 | 19.087351 | 1.207420 |
2014-04-14 23:45:00 | 19.594689 | 1.413067 |
2014-04-14 23:50:00 | 19.767817 | 1.401750 |
2014-04-14 23:55:00 | 20.479156 | 0.939501 |
4032 rows × 2 columns
from adtk.visualization import plot
plot(s_train)
We will perform anomaly detection using SeasonalAD.
For other methods, refer to Detector.
import matplotlib.pyplot as plt
from adtk.detector import OutlierDetector, PcaAD, RegressionAD
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import LocalOutlierFactor
model_dict = {
"OutlierDetector": OutlierDetector(LocalOutlierFactor(contamination=0.05)),
"RegressionAD": RegressionAD(regressor=LinearRegression(), target="value2", c=3.0),
"PcaAD": PcaAD(k=2),
}
for model_name, model in model_dict.items():
anomalies = model.fit_detect(s_train)
plot(
s_train,
anomaly=anomalies,
ts_linewidth=1,
ts_markersize=3,
anomaly_color="red",
anomaly_alpha=0.3,
curve_group="all",
)
plt.title(model_name)
plt.show()