An empirical cumulative distribution function (ECDF) is a simple chart that shows the share of samples below any given value. It is handy for making threshold decisions.
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
fig, ax = plt.subplots(figsize=(6, 4))
sns.ecdfplot(data=tips, x="total_bill", hue="time", ax=ax)
ax.set_xlabel("Bill total ($)")
ax.set_ylabel("Cumulative share")
ax.set_title("ECDF of bill totals")
ax.grid(alpha=0.2)
fig.tight_layout()
fig.savefig("static/images/visualize/distribution/ecdf.svg")
Reading tips #
- Segments where the slope is steep indicate that many samples cluster there, while flat portions mean the values are spread out.
- Statements such as “80% of customers spend less than $30” become easy to justify.
- When comparing many series, limit the number of colors and rely on legends and line styles for clarity.