A quick-reference dictionary of key terms used throughout this site, with links to detailed pages.
Statistics & Data Preparation
#
| Term | Description | Related page |
|---|
| Feature | An input variable to the model; also called an explanatory variable | Feature selection |
| Target / Label | The output variable to predict | — |
| Missing value | An unrecorded or unknown entry in the dataset | Preparation |
| Outlier | A data point that deviates significantly from other observations | Isolation Forest |
| Normalisation | Scaling values to a 0–1 range | Preparation |
| Standardisation | Scaling to zero mean and unit variance | Preparation |
| One-hot encoding | Converting categorical variables into binary vectors | Preparation |
| Curse of dimensionality | Data becomes sparse as feature dimensions increase | PCA |
| SMOTE | Generates synthetic minority samples to address class imbalance | SMOTE |
| Stratified Sampling | Sampling that preserves population layer proportions | Stratified Sampling |
Supervised Learning — Regression
#
| Term | Description | Related page |
|---|
| Linear Regression | Models a linear relationship between inputs and output | Linear Regression |
| Regularisation | Penalises model complexity to prevent overfitting | Ridge / Lasso |
| L1 Regularisation (Lasso) | Adds the absolute sum of coefficients as penalty; produces sparse solutions | Ridge / Lasso |
| L2 Regularisation (Ridge) | Adds the squared sum of coefficients as penalty; shrinks coefficients | Ridge / Lasso |
| Multicollinearity | Strong correlation among features that destabilises estimation | PCA |
| Residual | Difference between observed and predicted values | MAE / RMSE |
Supervised Learning — Classification
#
| Term | Description | Related page |
|---|
| Logistic Regression | Linear classifier using the sigmoid function for probability estimation | Logistic Regression |
| Decision boundary | The hyperplane or surface that separates classes | SVM |
| Confusion matrix | A 2×2 (or n×n) table of TP / FP / FN / TN | Confusion Matrix |
| Precision | Fraction of positive predictions that are truly positive | Precision / Recall |
| Recall / Sensitivity | Fraction of actual positives correctly identified | Precision / Recall |
| F1-score | Harmonic mean of precision and recall | F1-score |
| ROC-AUC | Area under the ROC curve; threshold-independent ranking metric | ROC-AUC |
| Imbalanced data | Dataset with highly skewed class distribution | Balanced Accuracy |
| MLE | Maximum Likelihood Estimation — estimates parameters by maximising data likelihood | MLE |
| Ordinal Regression | Regression/classification for ordered categorical outcomes | Ordinal Regression |
| One-Class SVM | Unsupervised anomaly detection using the SVM framework | One-Class SVM |
Decision Trees & Ensembles
#
| Term | Description | Related page |
|---|
| Decision tree | A tree-structured model that predicts via rule-based splits | Decision Tree |
| Gini impurity | A measure of node impurity used as a split criterion | Parameters |
| Information gain | Reduction in entropy achieved by a split | Parameters |
| Bagging | Trains multiple models on bootstrap samples and averages predictions | Bagging |
| Boosting | Sequentially adds weak learners to reduce residual error | Gradient Boosting |
| Random Forest | Bagging + random feature selection per split | Random Forest |
| Gradient Boosting | Adds weak learners in the direction of the loss gradient | Gradient Boosting |
| XGBoost | Optimised gradient boosting with regularisation and approximation | XGBoost |
| LightGBM | Histogram-based fast gradient boosting | LightGBM |
| Stacking | Combines predictions from multiple models via a meta-learner | Stacking |
Interpretability & Explainable AI
#
| Term | Description | Related page |
|---|
| SHAP | Game-theory-based method that decomposes predictions into per-feature contributions | SHAP |
| PDP | Partial Dependence Plot — shows the average effect of a feature on prediction | PDP / ICE |
| ICE | Individual Conditional Expectation — per-sample feature effect curves | PDP / ICE |
Clustering & Dimensionality Reduction
#
| Term | Description | Related page |
|---|
| k-means | Centroid-based clustering that minimises within-cluster distances | k-means |
| DBSCAN | Density-based clustering that finds arbitrarily shaped clusters | DBSCAN |
| Silhouette score | Cluster quality metric ranging from -1 to 1 | k-means |
| PCA | Projects data onto orthogonal axes of maximum variance | PCA |
| Eigenvalue | The amount of variance explained by a principal component | PCA |
| t-SNE | Maps high-dimensional data to 2D/3D preserving local structure | t-SNE |
| UMAP | Faster non-linear dimensionality reduction preserving both local and global structure | UMAP |
Model Selection & Evaluation
#
| Term | Description | Related page |
|---|
| Cross-validation | Splits data into folds to estimate generalisation performance | Cross-validation |
| Hold-out | A single train/test split for quick evaluation | — |
| Overfitting | Model memorises training data and fails on unseen data | Validation curve |
| Underfitting | Model is too simple to capture patterns in the data | Learning curve |
| Bias-Variance tradeoff | Balance between model bias and variance; governs complexity tuning | Learning curve |
| Hyperparameter | A structural parameter set before training begins | Validation curve |
| R² | Fraction of target variance explained by the model | R² |
| MAE | Mean Absolute Error — average of absolute prediction errors | MAE / RMSE |
| RMSE | Root Mean Squared Error — penalises large errors more heavily | MAE / RMSE |
| AIC / BIC | Information criteria balancing likelihood and model complexity | AIC / BIC |
| Optuna | Bayesian optimisation framework for automated hyperparameter tuning | Optuna |
| Huber Loss | Robust loss combining MAE and MSE; reduces sensitivity to outliers | Huber Loss |
| Focal Loss | Loss function for imbalanced classification that focuses on hard samples | Focal Loss |
| PSI | Population Stability Index — monitors distribution drift over time | PSI |
Time Series
#
| Term | Description | Related page |
|---|
| Stationarity | Statistical properties do not change over time | Time series |
| Trend | Long-term upward or downward movement in a series | Time series |
| Seasonality | Regularly repeating patterns at fixed intervals | Holt-Winters |
| Autocorrelation | Correlation of a series with its own past values | Time series |
| ARIMA | Classic model combining autoregression, differencing, and moving average | ARIMA |
| Exponential Smoothing | Weights past values with exponentially decaying factors | Exponential Smoothing |
Distance & Similarity
#
| Term | Description | Related page |
|---|
| Cosine similarity | Measures directional agreement of vectors on a -1 to 1 scale | Cosine Similarity |
| KL Divergence | Asymmetric information-theoretic distance between distributions | KLD |
| Wasserstein distance | Minimum transport cost to transform one distribution into another | Wasserstein |
Visualisation
#
| Term | Description | Related page |
|---|
| Histogram | Bins continuous values to show frequency distribution | Histogram |
| Scatter plot | Plots two variables as points to reveal relationships | Scatter plot |
| Heatmap | Uses colour intensity to represent values in a matrix layout | Correlation Heatmap |
| Box plot | Summarises quartiles and outliers in a compact graphic | Visualisation |
| Violin plot | Combines box plot with kernel density estimation for distribution shape | Violin plot |