我有一個資料框:
state city score
CA San Francisco 80
CA San Francisco 90
...
NC Raleigh 44
NY New York City 22
我想做一個 groupby.head(),但不是整數值,我想選擇每個州 - 城市組合的前 80%,按 Score 排序。
因此,如果 CA, San Francisco 有 100 行,而 NC, Raleigh 有 20 行,最終資料框將包含 CA, San Francisco 的前 80 行得分,以及 NC, Raleigh 的前 16 行得分。
所以最終的結果代碼可能類似于:
df.sort_values('score', ascending=False).groupby(['State', 'City']).head(80%)
謝謝!
uj5u.com熱心網友回復:
from io import StringIO
import pandas as pd
# sample data
s = """state,city,score
CA,San Francisco,80
CA,San Francisco,90
CA,San Francisco,30
CA,San Francisco,10
CA,San Francisco,70
CA,San Francisco,60
CA,San Francisco,50
CA,San Francisco,40
NC,Raleigh,44
NC,Raleigh,54
NC,Raleigh,64
NC,Raleigh,14
NY,New York City,22
NY,New York City,12
NY,New York City,32
NY,New York City,42
NY,New York City,52"""
df = pd.read_csv(StringIO(s))
sample = .8 # 80%
# sort the values and create a groupby object
g = df.sort_values('score', ascending=False).groupby(['state', 'city'])
# use list comprehension to iterate over each group
# for each group, calculate what 80% is
# in other words, the length of each group multiplied by .8
# you then use int to round down to the whole number
new_df = pd.concat([data.head(int(len(data)*sample)) for _,data in g])
state city score
1 CA San Francisco 90
0 CA San Francisco 80
4 CA San Francisco 70
5 CA San Francisco 60
6 CA San Francisco 50
7 CA San Francisco 40
10 NC Raleigh 64
9 NC Raleigh 54
8 NC Raleigh 44
16 NY New York City 52
15 NY New York City 42
14 NY New York City 32
12 NY New York City 22
uj5u.com熱心網友回復:
nlargest根據其長度使用和計算每組選定行的數量,即0.8 * len(group)
res = (
df.groupby(['State', 'City'], group_keys=False)
.apply(lambda g: g.nlargest(int(0.8*len(g)), "Score"))
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/340425.html
上一篇:如何在不計算整個DataFrame的情況下從DaskDataFrame中提取前五個值?
下一篇:從索引和值點創建矩陣
