Linear Discriminant Analysis (LDA) | ลดมิติพร้อมแยกคลาส

Created: 2019-03-26 Last updated: 2020-03-11 Read time: 1 min

まとめ

LDA หาเวกเตอร์ที่เพิ่มอัตราส่วนระหว่างความแปรปรวนระหว่างคลาสกับในคลาส แล้วใช้เวกเตอร์นั้นลดมิติ
เหมาะกับการเตรียมข้อมูลสำหรับจำแนกหลายคลาส หรืออธิบายทิศทางที่ทำให้คลาสต่างกันมากที่สุด
ผลลัพธ์สูงสุดได้ไม่เกิน $K-1$ มิติ (เมื่อมี $K$ คลาส) เพราะแต่ละเวกเตอร์แยกได้เพียงหนึ่งทิศทาง
LinearDiscriminantAnalysis ใน scikit-learn ให้ทั้งวิธีจำแนกและ transform สำหรับลดมิติ

สูตรสำคัญ #

ให้เมทริกซ์ความแปรปรวนภายในคลาส $S_W$ และระหว่างคลาส $S_B$ หาเวกเตอร์ $w$ ที่เพิ่ม

$$ J(w) = \frac{w^\top S_B w}{w^\top S_W w} $$

แก้โดยการหาเวกเตอร์ลักษณะเฉพาะของ $S_W^{-1} S_B$ แล้วเลือก $K-1$ ตัวแรก

ตัวอย่าง Python #

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from sklearn.datasets import make_classification
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

X, y = make_classification(
    n_samples=300,
    n_features=5,
    n_informative=3,
    n_classes=3,
    n_clusters_per_class=1,
    random_state=0,
)

lda = LinearDiscriminantAnalysis(n_components=2)
X_proj = lda.fit_transform(X, y)

plt.figure(figsize=(6, 5))
plt.scatter(X_proj[:, 0], X_proj[:, 1], c=y, cmap="tab10")
plt.title("ลดมิติด้วย LDA (2 components)")
plt.xlabel("LD1")
plt.ylabel("LD2")
plt.grid(alpha=0.2)
plt.tight_layout()
plt.show()

การฉายข้อมูลด้วย LDA

เคล็ดลับ #

LDA ต้องการเมทริกซ์ช่วงในคลาสกลับได้ (หรือใช้ PCA ก่อน)
ใช้สำหรับเตรียมข้อมูลให้โมเดลอื่น เช่น SVM หรือ logistic regression
สามารถใช้ store_covariance=True เพื่อตรวจสอบโครงสร้างในแต่ละคลาส

เอกสารอ้างอิง #

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics.
scikit-learn developers. (2024). Linear Discriminant Analysis. https://scikit-learn.org/stable/modules/lda_qda.html