我有這個資料框:
record = {
'F1': ['x1', 'x2','x3', 'x4','x5','x6','x7'],
'F2': ['a1', 'a2','a3', 'a4','a5','a6','a7'],
'Sex': ['F', 'M','F', 'M','M','M','F'] }
# Creating a dataframe
df = pd.DataFrame(record)
例如,我想創建此資料幀的 2 個樣本,同時在 Sex 列上保持 50-50 的固定比率。我試過這樣:
df_dict ={}
for i in range(2):
df_dict['df{}'.format(i)] = df.sample(frac=0.50, random_state=123)
但我得到的輸出似乎與我的期望不符:
df_dict["df0"]
# Output:
F1 F2 Sex
1 x2 a2 M
3 x4 a4 M
4 x5 a5 M
0 x1 a1 F
有什么幫助嗎?
uj5u.com熱心網友回復:
可能不是最好的主意,但我相信它可能會幫助您以某種方式解決問題:
n = 2
fDf = df[df["Sex"] == "F"].sample(frac=0.5, random_state=123).iloc[:n]
mDf = df[df["Sex"] == "M"].sample(frac=0.5, random_state=123).iloc[:n]
fDf.append(mDf)
輸出
F1 F2 Sex
0 x1 a1 F
2 x3 a3 F
5 x6 a6 M
1 x2 a2 M
uj5u.com熱心網友回復:
這也應該有效
n = 2
df.groupby('Sex', group_keys=False).apply(lambda x: x.sample(n))
uj5u.com熱心網友回復:
不要使用frac它會給你每組的一小部分,但這n會給你每組一個固定的值:
df.groupby('Sex').sample(n=2)
示例輸出:
F1 F2 Sex
2 x3 a3 F
0 x1 a1 F
3 x4 a4 M
4 x5 a5 M
使用自定義比率
ratios = {'F':0.4, 'M':0.6} # sum should be 1
# total number desired
total = 4
# note that the exact number in the output depends
# on the rounding method to convert to int
# round should give the correct number but floor/ceil might
# under/over-sample
# see below for an example
s = pd.Series(ratios)*total
# convert to integer (chose your method, ceil/floor/round...)
s = np.ceil(s).astype(int)
df.groupby('Sex').apply(lambda x: x.sample(n=s[x.name])).droplevel(0)
示例輸出:
F1 F2 Sex
0 x1 a1 F
6 x7 a7 F
4 x5 a5 M
3 x4 a4 M
1 x2 a2 M
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/459190.html
上一篇:合并期間僅向第一個組合添加值
