在Python中，將模型分數隨機抽成4組，分布相似。 -有解無憂

我有一個資料集，模型得分從0到1不等，表格看起來如下：

我有一個資料集，模型得分從0到1不等。

| Score ||
| ----- |
| 0.55 |
| 0.67 !
| 0.21 !
| 0.05 !
| 0.91 |
| 0.15 |
| 0.33 |
| 0.47 !

我想把這些分數隨機分成4組。對照組，治療1，治療2，治療3。對照組應該有20%的觀察值，其余80%的觀察值必須分給其他3個同等大小的組。然而，我希望每組的分數分布是相同的。我如何用python來解決這個問題呢？

PS：這只是實際表格的代表，但它將有比這更多的意見。

uj5u.com熱心網友回復：

你可以使用numpy.random.choice來設定具有定義概率的隨機組，然后groupby來分割資料框架：

import numpy as np
group = np.random.choice(['control', 'treatment 1', 'treatment 2', 'treatment 3']。
                          size=len（df）。
                          p=[.2, .8/3, 。 8/3, .8/3] )

dict(list(df.groupby(pd.Series(group, index=df.index)))))

可能的輸出（字典中的每個值是一個DataFrame）：

{'control'/span>: Score
 2 0.21: 分數
 5 0.15,
 '治療1': 分數
 7 0.47,
 '治療2': 分數
 1 0.67[/span
 3 0.05。
 '治療3': 分數
 0 0.55: 得分
 4 0.91[/span
 6 0.33}。

uj5u.com熱心網友回復：

我使用串列只是為了說明問題。對于每個數字，你扔一個五面骰子，如果是1，就進入控制。如果不是1，你就扔一個3面骰子（是的，可能沒有這種東西；）），這就決定了治療組。

import random
list = [0.23, 0.034, 0。 35, 0.75, 0.92, 0.25, 0.9]   
對照組 = []
治療1 = []
治療2 = []
治療3 = []
for scorein list:
    dice = random.randint(1,5)
    print(dice, 'is dice')
    if dice == 1:
        control.append(score)
    else:
        seconddice = random.randint(1,3)
        print(seconddice, 'is second dice')
        if seconddice == 1:
            treatment1.append(score)
        elif seconddice == 2:
            treatment2.append(score)
        else: # seconddice == 3:: treatment3.append(score)
            treatment3.append(score)
    
print(control, 'is control'/span>)
print(treatment1, 'is treatment1')
print('and so on')

我做了一個簡短的串列來測驗，結果是

5  is dice
1 是第二顆骰子
1 is 骰子
4  is 骰子
3 是第二個骰子
5 是骰子
2 是第二個骰子
1  is 骰子
5  is 骰子
1 is 第二個骰子
3  is 骰子
1 is 第二個骰子
[0.034, 0.92] 是控制
[0.23, 0.25, 0.9] 是治療1
and so on

資料集越大，你的分布就越好。

uj5u.com熱心網友回復：

生成數字：

import random
randomlist = []
for i in range（0,10）。
    n = random.uniform(0,1)
    randomlist.append(n)

隨機串列

分成幾塊： - 所以在這種情況下：

categories = 4;
length = round(len(隨機串列)/categories)

chunks = [randomlist[x: x length] for x in range(0, len（隨機串列）, length) ]

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/332416.html

標籤：

上一篇：如何用多指標對pandas系列做位置索引

下一篇：由于TypeError，Pandasgroupby無法連接字串