- Linear classifiers separate classes with hyperplanes, offering interpretability and fast training.
- Perceptron, logistic regression, SVM, and related methods share many ideas, which makes comparing them insightful.
- Concepts learned here connect to regularisation, kernels, and distance-based learning.
Linear Classification #
Intuition #
Linear classifiers place a hyperplane in feature space and assign labels depending on which side of the plane a sample falls. From the simple perceptron to probabilistic logistic regression and margin-based SVM, the key difference lies in how the hyperplane is chosen.
Mathematical formulation #
A generic linear classifier predicts \(\hat{y} = \operatorname{sign}(\mathbf{w}^\top \mathbf{x} + b)\). Training methods differ in how they determine \(\mathbf{w}\); probabilistic variants pass the linear score through a sigmoid or softmax. Kernel tricks replace the dot product with a kernel function to obtain non-linear boundaries while keeping the same recipe.
Experiments with Python #
This chapter walks through hands-on examples in Python and scikit-learn:
- Perceptron: update rules and decision boundaries for the simplest linear classifier.
- Logistic regression: probabilistic binary classification with cross-entropy.
- Softmax regression: multinomial extension that outputs all class probabilities.
- Linear discriminant analysis (LDA): directions that maximise class separability.
- Support vector machines (SVM): margin maximisation and kernel tricks.
- Naive Bayes: fast classification under the conditional independence assumption.
- k-Nearest neighbours: lazy learning based on distances.
You can copy the code snippets and experiment to observe how decision boundaries and probabilities behave.
References #
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.