我有以下資料框,其中城市是列,年齡是值:
| 城市1 | 城市2 | 城市3 |
|---|---|---|
| 2 | 14 | 61 |
| 51 | 73 | 35 |
| 42 | 38 | 13 |
| 12 | 75 | 24 |
| 27 | 42 | 78 |
我想創建一個新的資料框,其中列是年齡組,城市是索引,如下所示:
| 0-20 | 20-40 | 40-60 | 60-80 | |
|---|---|---|---|---|
| 城市1 | 2 | 1 | 1 | 0 |
| 城市2 | 1 | 1 | 1 | 0 |
| 城市3 | 1 | 2 | 0 | 2 |
這可以在熊貓中做到嗎?
uj5u.com熱心網友回復:
試試這個,使用pd.cut:
dfc = pd.cut(df.rename_axis('Cities', axis=1).stack(),
bins=[-np.inf,20,40,60,np.inf],
labels='0-20 20-40 40-60 60-80'.split(' ')).reset_index()
pd.crosstab(dfc['Cities'], dfc[0]).reset_index()
輸出:
0 Cities 0-20 20-40 40-60 60-80
0 City1 2 1 2 0
1 City2 1 1 1 2
2 City3 1 2 0 2
uj5u.com熱心網友回復:
這是pd.Series.between用于范圍和城市的所有組合的解決方案。
new_data = []
for city in df.columns:
new_city = []
for left, right in [(0,20),(20,40),(40,60),(60,80)]:
new_city.append(df[city].between(left,right, inclusive="left").sum())
new_data.append(new_city)
new_df = pd.DataFrame(new_data, columns=["0-20","20-40","40-60","60-80"], index=[df.columns])
new_df
uj5u.com熱心網友回復:
#this should work
import pandas as pd
#creating df
data = [[2, 14, 61], [51, 73, 35], [42, 38, 13], [12, 75, 24], [27, 42, 78]]
df = pd.DataFrame(data, columns = ['city1', 'city2', 'city3'])
#sorting by given intervals
data_new = [[df[(df['city1'] > 0) & (df['city1'] <= 20)]['city1'].count(), df[(df['city1'] > 20) & (df['city1'] <= 40)]['city1'].count(), df[(df['city1'] > 40) & (df['city1'] <= 60)]['city1'].count(), df[(df['city1'] > 60) & (df['city1'] <= 80)]['city1'].count()], [df[(df['city2'] > 0) & (df['city2'] <= 20)]['city2'].count(), df[(df['city2'] > 20) & (df['city2'] <= 40)]['city2'].count(), df[(df['city2'] > 40) & (df['city2'] <= 60)]['city2'].count(), df[(df['city2'] > 60) & (df['city2'] <= 80)]['city2'].count()], [df[(df['city3'] > 0) & (df['city3'] <= 20)]['city3'].count(),df[(df['city3'] > 20) & (df['city3'] <= 40)]['city3'].count(), df[(df['city3'] > 40) & (df['city3'] <= 60)]['city3'].count(), df[(df['city3'] > 60) & (df['city3'] <= 80)]['city3'].count()]]
#creating a new df with new data
df_new = pd.DataFrame(data_new, index= ['city1', 'city2', 'city3'], columns= ['0-20', '20-40', '40-60', '60-80'])
#so the point is to add this "index= ['city1', 'city2', 'city3']," between data and columns when you create a new dataframe
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/458755.html
上一篇:從串列中創建N個資料框
下一篇:計算每年銷售額排名前N的產品
