Pandas：將資料框重構為列值-有解無憂

我有以下資料框，其中城市是列，年齡是值：

城市1	城市2	城市3
2	14	61
51	73	35
42	38	13
12	75	24
27	42	78

我想創建一個新的資料框，其中列是年齡組，城市是索引，如下所示：

	0-20	20-40	40-60	60-80
城市1	2	1	1	0
城市2	1	1	1	0
城市3	1	2	0	2

這可以在熊貓中做到嗎？

uj5u.com熱心網友回復：

試試這個，使用pd.cut：

dfc = pd.cut(df.rename_axis('Cities', axis=1).stack(), 
             bins=[-np.inf,20,40,60,np.inf], 
             labels='0-20 20-40 40-60 60-80'.split(' ')).reset_index()

pd.crosstab(dfc['Cities'], dfc[0]).reset_index()

輸出：

0 Cities  0-20  20-40  40-60  60-80
0  City1     2      1      2      0
1  City2     1      1      1      2
2  City3     1      2      0      2

uj5u.com熱心網友回復：

這是pd.Series.between用于范圍和城市的所有組合的解決方案。

new_data = []
for city in df.columns:
    new_city = []
    for left, right in [(0,20),(20,40),(40,60),(60,80)]:
        new_city.append(df[city].between(left,right, inclusive="left").sum())
    new_data.append(new_city)
new_df = pd.DataFrame(new_data, columns=["0-20","20-40","40-60","60-80"], index=[df.columns])
new_df

uj5u.com熱心網友回復：

#this should work

import pandas as pd

#creating df
data = [[2, 14, 61], [51, 73, 35], [42, 38, 13], [12, 75, 24], [27, 42, 78]]

df = pd.DataFrame(data, columns = ['city1', 'city2', 'city3'])

#sorting by given intervals

data_new = [[df[(df['city1'] > 0) & (df['city1'] <= 20)]['city1'].count(), df[(df['city1'] > 20) & (df['city1'] <= 40)]['city1'].count(), df[(df['city1'] > 40) & (df['city1'] <= 60)]['city1'].count(), df[(df['city1'] > 60) & (df['city1'] <= 80)]['city1'].count()], [df[(df['city2'] > 0) & (df['city2'] <= 20)]['city2'].count(), df[(df['city2'] > 20) & (df['city2'] <= 40)]['city2'].count(), df[(df['city2'] > 40) & (df['city2'] <= 60)]['city2'].count(), df[(df['city2'] > 60) & (df['city2'] <= 80)]['city2'].count()], [df[(df['city3'] > 0) & (df['city3'] <= 20)]['city3'].count(),df[(df['city3'] > 20) & (df['city3'] <= 40)]['city3'].count(), df[(df['city3'] > 40) & (df['city3'] <= 60)]['city3'].count(), df[(df['city3'] > 60) & (df['city3'] <= 80)]['city3'].count()]]

#creating a new df with new data

df_new = pd.DataFrame(data_new, index= ['city1', 'city2', 'city3'], columns= ['0-20', '20-40', '40-60', '60-80'])

#so the point is to add this "index= ['city1', 'city2', 'city3']," between data and columns when you create a new dataframe

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/458755.html

標籤：Python 熊猫数据框

上一篇：從串列中創建N個資料框

下一篇：計算每年銷售額排名前N的產品