Glossary

Glossary

A quick-reference dictionary of key terms used throughout this site, with links to detailed pages.


Statistics & Data Preparation #

TermDescriptionRelated page
FeatureAn input variable to the model; also called an explanatory variableFeature selection
Target / LabelThe output variable to predict
Missing valueAn unrecorded or unknown entry in the datasetPreparation
OutlierA data point that deviates significantly from other observationsIsolation Forest
NormalisationScaling values to a 0–1 rangePreparation
StandardisationScaling to zero mean and unit variancePreparation
One-hot encodingConverting categorical variables into binary vectorsPreparation
Curse of dimensionalityData becomes sparse as feature dimensions increasePCA
SMOTEGenerates synthetic minority samples to address class imbalanceSMOTE
Stratified SamplingSampling that preserves population layer proportionsStratified Sampling

Supervised Learning — Regression #

TermDescriptionRelated page
Linear RegressionModels a linear relationship between inputs and outputLinear Regression
RegularisationPenalises model complexity to prevent overfittingRidge / Lasso
L1 Regularisation (Lasso)Adds the absolute sum of coefficients as penalty; produces sparse solutionsRidge / Lasso
L2 Regularisation (Ridge)Adds the squared sum of coefficients as penalty; shrinks coefficientsRidge / Lasso
MulticollinearityStrong correlation among features that destabilises estimationPCA
ResidualDifference between observed and predicted valuesMAE / RMSE

Supervised Learning — Classification #

TermDescriptionRelated page
Logistic RegressionLinear classifier using the sigmoid function for probability estimationLogistic Regression
Decision boundaryThe hyperplane or surface that separates classesSVM
Confusion matrixA 2×2 (or n×n) table of TP / FP / FN / TNConfusion Matrix
PrecisionFraction of positive predictions that are truly positivePrecision / Recall
Recall / SensitivityFraction of actual positives correctly identifiedPrecision / Recall
F1-scoreHarmonic mean of precision and recallF1-score
ROC-AUCArea under the ROC curve; threshold-independent ranking metricROC-AUC
Imbalanced dataDataset with highly skewed class distributionBalanced Accuracy
MLEMaximum Likelihood Estimation — estimates parameters by maximising data likelihoodMLE
Ordinal RegressionRegression/classification for ordered categorical outcomesOrdinal Regression
One-Class SVMUnsupervised anomaly detection using the SVM frameworkOne-Class SVM

Decision Trees & Ensembles #

TermDescriptionRelated page
Decision treeA tree-structured model that predicts via rule-based splitsDecision Tree
Gini impurityA measure of node impurity used as a split criterionParameters
Information gainReduction in entropy achieved by a splitParameters
BaggingTrains multiple models on bootstrap samples and averages predictionsBagging
BoostingSequentially adds weak learners to reduce residual errorGradient Boosting
Random ForestBagging + random feature selection per splitRandom Forest
Gradient BoostingAdds weak learners in the direction of the loss gradientGradient Boosting
XGBoostOptimised gradient boosting with regularisation and approximationXGBoost
LightGBMHistogram-based fast gradient boostingLightGBM
StackingCombines predictions from multiple models via a meta-learnerStacking

Interpretability & Explainable AI #

TermDescriptionRelated page
SHAPGame-theory-based method that decomposes predictions into per-feature contributionsSHAP
PDPPartial Dependence Plot — shows the average effect of a feature on predictionPDP / ICE
ICEIndividual Conditional Expectation — per-sample feature effect curvesPDP / ICE

Clustering & Dimensionality Reduction #

TermDescriptionRelated page
k-meansCentroid-based clustering that minimises within-cluster distancesk-means
DBSCANDensity-based clustering that finds arbitrarily shaped clustersDBSCAN
Silhouette scoreCluster quality metric ranging from -1 to 1k-means
PCAProjects data onto orthogonal axes of maximum variancePCA
EigenvalueThe amount of variance explained by a principal componentPCA
t-SNEMaps high-dimensional data to 2D/3D preserving local structuret-SNE
UMAPFaster non-linear dimensionality reduction preserving both local and global structureUMAP

Model Selection & Evaluation #

TermDescriptionRelated page
Cross-validationSplits data into folds to estimate generalisation performanceCross-validation
Hold-outA single train/test split for quick evaluation
OverfittingModel memorises training data and fails on unseen dataValidation curve
UnderfittingModel is too simple to capture patterns in the dataLearning curve
Bias-Variance tradeoffBalance between model bias and variance; governs complexity tuningLearning curve
HyperparameterA structural parameter set before training beginsValidation curve
Fraction of target variance explained by the model
MAEMean Absolute Error — average of absolute prediction errorsMAE / RMSE
RMSERoot Mean Squared Error — penalises large errors more heavilyMAE / RMSE
AIC / BICInformation criteria balancing likelihood and model complexityAIC / BIC
OptunaBayesian optimisation framework for automated hyperparameter tuningOptuna
Huber LossRobust loss combining MAE and MSE; reduces sensitivity to outliersHuber Loss
Focal LossLoss function for imbalanced classification that focuses on hard samplesFocal Loss
PSIPopulation Stability Index — monitors distribution drift over timePSI

Time Series #

TermDescriptionRelated page
StationarityStatistical properties do not change over timeTime series
TrendLong-term upward or downward movement in a seriesTime series
SeasonalityRegularly repeating patterns at fixed intervalsHolt-Winters
AutocorrelationCorrelation of a series with its own past valuesTime series
ARIMAClassic model combining autoregression, differencing, and moving averageARIMA
Exponential SmoothingWeights past values with exponentially decaying factorsExponential Smoothing

Distance & Similarity #

TermDescriptionRelated page
Cosine similarityMeasures directional agreement of vectors on a -1 to 1 scaleCosine Similarity
KL DivergenceAsymmetric information-theoretic distance between distributionsKLD
Wasserstein distanceMinimum transport cost to transform one distribution into anotherWasserstein

Visualisation #

TermDescriptionRelated page
HistogramBins continuous values to show frequency distributionHistogram
Scatter plotPlots two variables as points to reveal relationshipsScatter plot
HeatmapUses colour intensity to represent values in a matrix layoutCorrelation Heatmap
Box plotSummarises quartiles and outliers in a compact graphicVisualisation
Violin plotCombines box plot with kernel density estimation for distribution shapeViolin plot