top page> Preprocesamiento de Datos> Datos numéricos> Binning

Binning

import numpy as np
import pandas as pd

df = pd.read_csv("../data/sample.csv")
df.head()

	元号	和暦	西暦	人口総数	町名
0	大正	9.0	1920.0	394748	A町
1	大正	9.0	1920.0	31421	B町
2	大正	9.0	1920.0	226993	C町
3	大正	9.0	1920.0	253689	D町
4	大正	9.0	1920.0	288602	E町

Binning de datos en cada cuantil

pandas.qcut

Binning based on how much of the data is X% of the total when sorted.

0.0        46.0
0.1     18002.9 <- 10%の値
0.2     20476.8
0.3     22755.0
0.4     26204.8
0.5     30824.0
0.6     45622.6
0.7     89873.9
0.8    245544.0
0.9    290714.1 <- 90%の値
1.0    765403.0

df["人口総数_ビン化"] = pd.qcut(df["人口総数"], q=11)
df[["人口総数", "人口総数_ビン化"]].head()

	人口総数	人口総数_ビン化
0	394748	(294187.0, 765403.0]
1	31421	(28169.0, 34470.0]
2	226993	(214984.0, 249929.0]
3	253689	(249929.0, 294187.0]
4	288602	(249929.0, 294187.0]

Binning

Binning de datos en cada cuantil

Comentarios