我有一個資料框,例如
Groups Values
A cow
A cow
A cow
A dog
B cat
B cat
B cat
B fish
C cow
C fish
C cat
我想添加一個新列(計數),在其中計算每個列中的奶牛 數量。tab['Values']tab['Groups']
然后我應該得到:
Groups Values Count
A cow 3
A cow 3
A cow 3
A dog 3
B cat 0
B cat 0
B cat 0
B fish 0
C cow 1
C fish 1
C cat 1
非常感謝你的幫助
uj5u.com熱心網友回復:
另一種方法是在過濾資料框后使用value_counts和reindex:
df['count'] = df[df['Values'] == 'cow'].value_counts('Groups') \
.reindex(df['Groups'], fill_value=0).values
print(df)
# Output:
Groups Values count
0 A cow 3
1 A cow 3
2 A cow 3
3 A dog 3
4 B cat 0
5 B cat 0
6 B cat 0
7 B fish 0
8 C cow 1
9 C fish 1
10 C cat 1
表現
676組,10k條記錄
# with groupby, @jezrael
%timeit df['Values'].eq('cow').groupby(df['Groups']).transform('sum')
1.58 ms ± 25 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# with value_counts, @Corralien
%timeit df[df['Values'] == 'cow'].value_counts('Groups').reindex(df['Groups'], fill_value=0).values
1.37 ms ± 3.58 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Setup, so ugly
import strings
groups = [a b for a,b in zip(np.random.choice(list(string.ascii_uppercase), 10000), np.random.choice(list(string.ascii_uppercase), 10000))]
animals = np.random.choice(pd.read_csv('https://gist.githubusercontent.com/atduskgreg/3cf8ef48cb0d29cf151bedad81553a54/raw/82f142562cf50b0f6fb8010f890b2f934093553e/animals.txt', error_bad_lines=False, header=None, squeeze=True), 10000)
df = pd.DataFrame({'Groups': groups, 'Values': animals})
uj5u.com熱心網友回復:
使用GroupBy.transform帶面具計數True人民共同sum:
df['count'] = df['Values'].eq('cow').groupby(df['Groups']).transform('sum')
print (df)
Groups Values count
0 A cow 3
1 A cow 3
2 A cow 3
3 A dog 3
4 B cat 0
5 B cat 0
6 B cat 0
7 B fish 0
8 C cow 1
9 C fish 1
10 C cat 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/369420.html
