我有一個看起來像這樣的資料集:
| 標準 | 回答 | 頻率 |
|---|---|---|
| 標準 1 | 答案 A | 任意整數 |
| 標準 2 | 答案 B | 任意整數 |
對于每個標準,調查中有一系列從 A 到 E 的答案選項,共有 4 個標準。但是,表中有多個條件實體。
我試圖找出如何按標準折疊資訊,即對于每個標準,受訪者給出的答案的分布范圍是多少,以百分比形式表示?
我曾嘗試使用 groupby 但無濟于事。
uj5u.com熱心網友回復:
請注意,我錯誤地使用了頻率浮點值而不是整數。
樣本資料:
Criteria Answer frequency
0 Criteria 1 Answer A 0.1
1 Criteria 1 Answer A 0.2
2 Criteria 1 Answer A 0.6
3 Criteria 1 Answer A 0.3
4 Criteria 1 Answer B 0.7
5 Criteria 1 Answer B 0.4
6 Criteria 1 Answer B 0.9
7 Criteria 2 Answer A 0.1
8 Criteria 2 Answer A 0.1
9 Criteria 2 Answer A 0.1
10 Criteria 2 Answer C 0.1
11 Criteria 2 Answer C 0.4
12 Criteria 2 Answer C 0.7
df.groupby(["Criteria", "Answer"]).apply(lambda x: x.min())
輸出:
frequency
Criteria Answer
Criteria 1 Answer A 0.5
Answer B 0.5
Criteria 2 Answer A 0.0
Answer C 0.6
df.groupby(["Criteria", "Answer"]).apply(lambda x: x.median())
輸出
frequency
Criteria Answer
Criteria 1 Answer A 0.25
Answer B 0.70
Criteria 2 Answer A 0.10
Answer C 0.40
df.groupby(["Criteria", "Answer"]).apply(lambda x: x.std())
輸出
frequency
Criteria Answer
Criteria 1 Answer A 2.160247e-01
Answer B 2.516611e-01
Criteria 2 Answer A 1.699675e-17
Answer C 3.000000e-01
如果你這樣做 .reset_index():
df.groupby(["Criteria", "Answer"]).apply(lambda x: x.std()).reset_index()
輸出:
Criteria Answer frequency
0 Criteria 1 Answer A 2.160247e-01
1 Criteria 1 Answer B 2.516611e-01
2 Criteria 2 Answer A 1.699675e-17
3 Criteria 2 Answer C 3.000000e-0
uj5u.com熱心網友回復:
IUC:
df = df.groupby(['Criteria', 'Answer']).size().reset_index(name='size')
df['frequency'] = df.groupby('Criteria')[['size']].apply(lambda x: x.div(x.sum()))
df.drop(columns=['size'], inplace=True)
print(df)
OUTPUT
Criteria Answer frequency
0 Criteria 1 Answer A 0.6
1 Criteria 1 Answer B 0.4
2 Criteria 2 Answer A 0.4
3 Criteria 2 Answer B 0.4
4 Criteria 2 Answer C 0.2
SETUP
data = """
Criteria\tAnswer
Criteria 1\tAnswer A
Criteria 1\tAnswer A
Criteria 1\tAnswer A
Criteria 1\tAnswer B
Criteria 1\tAnswer B
Criteria 2\tAnswer A
Criteria 2\tAnswer A
Criteria 2\tAnswer B
Criteria 2\tAnswer B
Criteria 2\tAnswer C
"""
df = pd.read_csv(StringIO(data), sep='\t')
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/389510.html
