Summarize distributions with box plots

Visualize

Summarize distributions with box plots

Created: Last updated: Read time: 1 min

Box plots are a staple chart for showing the median, quartiles, and outliers with a single glyph. Comparing categories makes differences in dispersion immediately visible.

import seaborn as sns
import matplotlib.pyplot as plt

mpg = sns.load_dataset("mpg").dropna(subset=["mpg", "origin"])

fig, ax = plt.subplots(figsize=(6, 4))
sns.boxplot(data=mpg, x="origin", y="mpg", palette="Set2", ax=ax)

ax.set_xlabel("Production region")
ax.set_ylabel("Fuel economy (MPG)")
ax.set_title("Fuel economy by region (box plot)")
ax.grid(axis="y", alpha=0.3)

fig.tight_layout()
fig.savefig("static/images/visualize/distribution/boxplot.svg")

Comparing categories reveals how the variance differs.

Reading tips #

  • The box represents the interquartile range (IQR) and the line in the middle is the median. Whiskers typically extend to 1.5×IQR.
  • When there are too many outliers, inspect them with another chart or relax the upper bound.
  • Flipping the plot horizontally keeps long labels readable.