Balanced Accuracy

入門

4.3.6

Balanced Accuracy

Last updated 2020-04-08 Read time 3 min
Summary
  • Understand the fundamentals of this metric, what it evaluates, and how to interpret the results.
  • Compute and visualise the metric with Python 3.13 code examples, covering key steps and practical checkpoints.
  • Combine charts and complementary metrics for effective model comparison and threshold tuning.
  • Confusion Matrix — understanding this concept first will make learning smoother

1. Definition #

Balanced Accuracy is the mean of the true-positive rate (TPR) and the true-negative rate (TNR): \mathrm{Balanced\ Accuracy} = \frac{1}{2}\left(\frac{TP}{TP + FN} + \frac{TN}{TN + FP}\right) For multiclass problems you average the recall of each class in the same spirit.


2. Implementation in Python 3.13 #

1
2
python --version        # e.g. Python 3.13.0
pip install scikit-learn matplotlib

We reuse the random-forest classifier from the Accuracy article and print both metrics side by side. The bar chart is saved at static/images/eval/classification/accuracy/accuracy_vs_balanced.png, so generate_eval_assets.py can regenerate it whenever you update the notebook.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, balanced_accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)
pipeline = make_pipeline(
    StandardScaler(),
    RandomForestClassifier(random_state=42, n_estimators=300),
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
acc = accuracy_score(y_test, y_pred)
bal_acc = balanced_accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.3f}, Balanced Accuracy: {bal_acc:.3f}")
Accuracy vs Balanced Accuracy

Balanced Accuracy weights each class equally by averaging the recall per class.


3. When to prefer Balanced Accuracy #

  • Strong class imbalance – plain Accuracy only reflects the majority class, while Balanced Accuracy keeps minority recall visible.
  • Model comparison – when benchmark teams submit models on skewed data, Balanced Accuracy makes their performance differences more honest.
  • Threshold tuning – combine it with precision/recall plots to see whether both classes remain detectable at your chosen threshold.

4. Companion metrics #

MetricMeasuresCaveat on imbalanced data
AccuracyOverall hit rateDominated by the majority class
Recall / SensitivityDetection rate per classRequires separate reporting for each class
Balanced AccuracyMean recall across classesHighlights minority-class recall loss
Macro F1Harmonic mean of precision & recall (per class)Useful when precision also matters

Summary #

  • Balanced Accuracy is the average of per-class recall, making it well suited to imbalanced datasets.
  • In Python 3.13, alanced_accuracy_score gives you the value in one line; compare it with Accuracy to show stakeholders the difference.
  • Combine it with precision, recall, and F1 metrics to decide how much weight to give each class when evaluating models.