我有一個包含代碼的欄位(本例中為 DMDEDUC2)。我想在這個欄位上計算一個頻率表(value_counts)并用用戶指定的標簽顯示它。下面的代碼完全實作了我想要的......但我覺得我肯定錯過了一種更標準的方式來實作預期的結果。
從邏輯上講,value_counts和replace線不能簡化。但可以肯定的是,其余的可能會更優雅。
有沒有更簡單的方法可以得到這個結果?一個更像熊貓的解決方案?
# Tiny dataset for clarity
import pandas as pd
df = pd.DataFrame({ 'DMDEDUC2': [5, 3, 3, 5, 4, 2, 4, 4] })
d = {
1: "<9"
, 2: "9-11"
, 3: "HS/GED"
, 4: "Some college/AA"
, 5: "College"
, 7: "Refused"
, 9: "Don't know"
}
# First get value counts (vc) for DMDEDUC2
# This line gets all the data I need in the correct order...
# but without the labels I need.
vc = df.DMDEDUC2.value_counts().sort_index()
# Convert the resulting Series to a DataFrame
# to allow for clear labels in a logical order
vc = vc.to_frame()
vc['DMDEDUC2x'] = vc.index
vc.DMDEDUC2x = vc.DMDEDUC2x.replace(d)
vc = vc.set_index('DMDEDUC2x')
vc = vc.rename({'DMDEDUC2':'COUNTS'}, axis=1)
print(vc)
期望的輸出(它按 [non-displayed] 代碼排序,而不是按值或標簽排序):
COUNTS
DMDEDUC2x
<9 655
9-11 643
HS/GED 1186
Some college/AA 1621
College 1366
Don't know 3
微小樣本資料集的期望輸出:
COUNTS
DMDEDUC2x
9-11 1
HS/GED 2
Some college/AA 3
College 2
uj5u.com熱心網友回復:
我認為它可以很容易地濃縮成兩行:
vc = df.DMDEDUC2.value_counts().sort_index().to_frame(name='COUNTS')
vc.index = vc.index.map(d).rename('DMDEDUC2')
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/443076.html
上一篇:在串列值之后命名資料框
下一篇:如何根據條件執行添加兩個資料框列
