Glossary

Glossary

A quick-reference dictionary of key terms used throughout this site, with links to detailed pages.

Statistics & Data Preparation #

Term	Description	Related page
Feature	An input variable to the model; also called an explanatory variable	Feature selection
Target / Label	The output variable to predict	—
Missing value	An unrecorded or unknown entry in the dataset	Preparation
Outlier	A data point that deviates significantly from other observations	Isolation Forest
Normalisation	Scaling values to a 0–1 range	Preparation
Standardisation	Scaling to zero mean and unit variance	Preparation
One-hot encoding	Converting categorical variables into binary vectors	Preparation
Curse of dimensionality	Data becomes sparse as feature dimensions increase	PCA
SMOTE	Generates synthetic minority samples to address class imbalance	SMOTE
Stratified Sampling	Sampling that preserves population layer proportions	Stratified Sampling

Supervised Learning — Regression #

Term	Description	Related page
Linear Regression	Models a linear relationship between inputs and output	Linear Regression
Regularisation	Penalises model complexity to prevent overfitting	Ridge / Lasso
L1 Regularisation (Lasso)	Adds the absolute sum of coefficients as penalty; produces sparse solutions	Ridge / Lasso
L2 Regularisation (Ridge)	Adds the squared sum of coefficients as penalty; shrinks coefficients	Ridge / Lasso
Multicollinearity	Strong correlation among features that destabilises estimation	PCA
Residual	Difference between observed and predicted values	MAE / RMSE

Supervised Learning — Classification #

Term	Description	Related page
Logistic Regression	Linear classifier using the sigmoid function for probability estimation	Logistic Regression
Decision boundary	The hyperplane or surface that separates classes	SVM
Confusion matrix	A 2×2 (or n×n) table of TP / FP / FN / TN	Confusion Matrix
Precision	Fraction of positive predictions that are truly positive	Precision / Recall
Recall / Sensitivity	Fraction of actual positives correctly identified	Precision / Recall
F1-score	Harmonic mean of precision and recall	F1-score
ROC-AUC	Area under the ROC curve; threshold-independent ranking metric	ROC-AUC
Imbalanced data	Dataset with highly skewed class distribution	Balanced Accuracy
MLE	Maximum Likelihood Estimation — estimates parameters by maximising data likelihood	MLE
Ordinal Regression	Regression/classification for ordered categorical outcomes	Ordinal Regression
One-Class SVM	Unsupervised anomaly detection using the SVM framework	One-Class SVM

Decision Trees & Ensembles #

Term	Description	Related page
Decision tree	A tree-structured model that predicts via rule-based splits	Decision Tree
Gini impurity	A measure of node impurity used as a split criterion	Parameters
Information gain	Reduction in entropy achieved by a split	Parameters
Bagging	Trains multiple models on bootstrap samples and averages predictions	Bagging
Boosting	Sequentially adds weak learners to reduce residual error	Gradient Boosting
Random Forest	Bagging + random feature selection per split	Random Forest
Gradient Boosting	Adds weak learners in the direction of the loss gradient	Gradient Boosting
XGBoost	Optimised gradient boosting with regularisation and approximation	XGBoost
LightGBM	Histogram-based fast gradient boosting	LightGBM
Stacking	Combines predictions from multiple models via a meta-learner	Stacking

Interpretability & Explainable AI #

Term	Description	Related page
SHAP	Game-theory-based method that decomposes predictions into per-feature contributions	SHAP
PDP	Partial Dependence Plot — shows the average effect of a feature on prediction	PDP / ICE
ICE	Individual Conditional Expectation — per-sample feature effect curves	PDP / ICE

Clustering & Dimensionality Reduction #

Term	Description	Related page
k-means	Centroid-based clustering that minimises within-cluster distances	k-means
DBSCAN	Density-based clustering that finds arbitrarily shaped clusters	DBSCAN
Silhouette score	Cluster quality metric ranging from -1 to 1	k-means
PCA	Projects data onto orthogonal axes of maximum variance	PCA
Eigenvalue	The amount of variance explained by a principal component	PCA
t-SNE	Maps high-dimensional data to 2D/3D preserving local structure	t-SNE
UMAP	Faster non-linear dimensionality reduction preserving both local and global structure	UMAP

Model Selection & Evaluation #

Term	Description	Related page
Cross-validation	Splits data into folds to estimate generalisation performance	Cross-validation
Hold-out	A single train/test split for quick evaluation	—
Overfitting	Model memorises training data and fails on unseen data	Validation curve
Underfitting	Model is too simple to capture patterns in the data	Learning curve
Bias-Variance tradeoff	Balance between model bias and variance; governs complexity tuning	Learning curve
Hyperparameter	A structural parameter set before training begins	Validation curve
R²	Fraction of target variance explained by the model	R²
MAE	Mean Absolute Error — average of absolute prediction errors	MAE / RMSE
RMSE	Root Mean Squared Error — penalises large errors more heavily	MAE / RMSE
AIC / BIC	Information criteria balancing likelihood and model complexity	AIC / BIC
Optuna	Bayesian optimisation framework for automated hyperparameter tuning	Optuna
Huber Loss	Robust loss combining MAE and MSE; reduces sensitivity to outliers	Huber Loss
Focal Loss	Loss function for imbalanced classification that focuses on hard samples	Focal Loss
PSI	Population Stability Index — monitors distribution drift over time	PSI

Time Series #

Term	Description	Related page
Stationarity	Statistical properties do not change over time	Time series
Trend	Long-term upward or downward movement in a series	Time series
Seasonality	Regularly repeating patterns at fixed intervals	Holt-Winters
Autocorrelation	Correlation of a series with its own past values	Time series
ARIMA	Classic model combining autoregression, differencing, and moving average	ARIMA
Exponential Smoothing	Weights past values with exponentially decaying factors	Exponential Smoothing

Distance & Similarity #

Term	Description	Related page
Cosine similarity	Measures directional agreement of vectors on a -1 to 1 scale	Cosine Similarity
KL Divergence	Asymmetric information-theoretic distance between distributions	KLD
Wasserstein distance	Minimum transport cost to transform one distribution into another	Wasserstein

Visualisation #

Term	Description	Related page
Histogram	Bins continuous values to show frequency distribution	Histogram
Scatter plot	Plots two variables as points to reveal relationships	Scatter plot
Heatmap	Uses colour intensity to represent values in a matrix layout	Correlation Heatmap
Box plot	Summarises quartiles and outliers in a compact graphic	Visualisation
Violin plot	Combines box plot with kernel density estimation for distribution shape	Violin plot