4.3.2
Precision-Recall
- Understand the fundamentals of this metric, what it evaluates, and how to interpret the results.
- Compute and visualise the metric with Python 3.13 code examples, covering key steps and practical checkpoints.
- Combine charts and complementary metrics for effective model comparison and threshold tuning.
- Confusion Matrix — understanding this concept first will make learning smoother
1. Definitions at a glance #
Given the confusion-matrix counts — true positives (TP), false positives (FP), false negatives (FN) — precision and recall are defined as: \text{Precision} = \frac{TP}{TP + FP}, \qquad \text{Recall} = \frac{TP}{TP + FN}
- Precision — of the items predicted as positive, which fraction is truly positive? Important when false alarms are costly.
- Recall — of the actual positives, which fraction do we catch? Important when misses must be avoided.
- F1 score — harmonic mean that balances precision and recall in a single number. F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
2. Implementation and PR curve on Python 3.13 #
Install the dependencies in your interpreter:
| |
The snippet below creates an imbalanced dataset (positive class 5%), trains a weighted logistic regression, and plots the precision–recall (PR) curve along with the average precision (AP). The figure is stored at static/images/eval/classification/precision-recall/pr_curve.png so generate_eval_assets.py can refresh it automatically.
| |

Average precision (AP) is the area under the PR curve. Moving the threshold slides along the curve.
3. Choosing a decision threshold #
- Keeping the default threshold (0.5) might yield low recall when the positive class is rare.
- Every point on the PR curve corresponds to a threshold from precision_recall_curve’s hresholds array.
- Lowering the threshold increases recall at the expense of precision; use business cost to decide how far you can move.
| |
4. Averaging strategies for multiclass tasks #
scikit-learn’s verage parameter lets you aggregate class-wise metrics:
- macro — simple mean across classes; treats each class equally.
- weighted — weighted mean by class support; preserves overall balance.
- micro — recompute from the pooled confusion matrix; useful but can hide minority-class behaviour.
| |
Summary #
- Precision penalises false positives, recall penalises false negatives; F1 balances both.
- Precision–recall curves reveal how the trade-off changes with the threshold; the average precision condenses it into a single number.
- In Python 3.13 you can compute and plot the curve with a few lines of scikit-learn, then share the threshold analysis for collaborative decision-making.