使用python在csv檔案中出現數字的限制-有解無憂

我有一個包含 3408 行和 46 列的 CSV 檔案，我想用 0 和 1 隨機填充這些列中的每一列，但限制了數字 1 的外觀。例如，在 AI 列中有 3408 條記錄，但數字 1應該只出現在總行數的 15% 中，并且在每一列中，必須是我給出了“1”的出現百分比。

到目前為止我所做的是，我創建了一個 CSV 檔案，其中包含 3408 行和 46 列隨機填充“0”和“1”，沒有百分比，

任何幫助或建議都會很棒！

速度 = 15,49%

社區駕駛員注意力不集中或警覺性降低 = 13,81%

過馬路時行人疏忽 = 6.73%。

# Create 2D Numpy array of 3408 rows and 46 columns,
# filled with random values 0 and 1 
random_data = np.random.randint(0,2,size=(3408,46))
# Create a Dataframe with random values
# using 2D numpy Array
df = pd.DataFrame(random_data, columns=['SPEEDING', 'Driver inattention or decreased alertness in neighborhoods',
                                        'Pedestrian carelessness when crossing the road' , 'Unsafe overtaking', 'Loss of vehicle control' ,
                                        'Refusal of priority' , 'Failure to maintain a safe distance' , 'Non-use of crosswalks' ,
                                        'Playing on the road or walking on the side of the road' ,
                                        'Dangerous maneuvers' , 'Drowsiness' , 'Driving without a license' ,
                                        'Driver inattention when passing a motorcycle' , 'Non respect of the direction imposed to the traffic' ,
                                        'Lane change without signalling' , 'Driving under the influence of alcohol or drugs' , 'Non respect of the road signs' ,
                                        'Driver inattention when leaving the parking area' , 'Driver carelessness when reversing' , 'Traffic in the wrong direction' ,
                                        'Non-respect of the stop sign' , 'Unsafe parking or stopping' , 'Dazzle from lights' ,
                                        'Manual use of mobile phone/ Wearing a headset' , 'Pedestrian crossing the railroad track without precaution' ,
                                        'Other Human Factors' , 'Defective tires (burst)' , 'Defective brakes' , 'Mechanical failures' , 'Defective steering system' ,
                                        'Lack of lighting device' , 'Non-regulation lighting device' , 'Overload' , 'Other Vehicle condition Factors' , 'Weather' ,
                                        'Defective road' , 'Animal crossing' , 'Lack of public lighting' , 'Slippery road surface' , 'Bad road design' , 'Potholes' ,
                                        'Glare of the sun' , 'Obstacle on the road' , 'Deformed roadway' ,
                                        'Other  State of the road infrastructure and atmospheric conditions' , 'Fatality'])
# Display the Dataframe
print(df)

# Save the Dataframe to a csv file
df.to_csv('test.csv')

uj5u.com熱心網友回復：

您可以將所有內容設定為 0 并使用random.sample（請參閱 doc）獲取應設定為 1 的行索引。

因此，要將 3408 行中的 15% 設為 1（讓我們四舍五入到 511），您可以獲得以下串列：

from random import sample

sample(range(0, 3407), 511)

編輯：你可以在這個問題下找到替代品

uj5u.com熱心網友回復：

IIUC，嘗試這樣的事情：

1.我是如何生成我的df的：

df = pd.DataFrame(np.zeros(shape=(100, 1)), columns=['speeding'])

2.然后：

df['speeding'] = df['speeding'].apply(lambda x: np.random.choice([0,1], p=[0.85,0.15]))

3.檢查：

df.value_counts()

4.結果：

speeding
0           85
1           15
dtype: int64

uj5u.com熱心網友回復：

import numpy as np
import sys


def get_percentages_of_ones(row_count):
    """ Create dummy percentage array. """
    percentage_of_ones = np.zeros(row_count)
    percentage_of_ones[0] = 0.15
    percentage_of_ones[1] = 0.50
    percentage_of_ones[2] = 1.0
    return percentage_of_ones


def create_array(percentage_of_ones, row_count, col_count):
    """ ___ """
    arr = np.empty([row_count, col_count], dtype="int8")
    for row_id, po1 in enumerate(percentage_of_ones):
        nb_ones = int(round(po1 * col_count))
        nb_zeros = col_count - nb_ones
        row = np.append(
            np.zeros(nb_zeros, dtype="int8"),
            np.ones(nb_ones, dtype="int8")
        )
        np.random.shuffle(row)
        arr[row_id] = row
    return arr


def display_array(arr, col_count):
    """ ___ """
    np.set_printoptions(threshold=sys.maxsize,
                        edgeitems=col_count,
                        linewidth=95)
    print(np.transpose(arr))


def save_to_csv(fname, data):
    """ ___ """
    np.savetxt(fname, data, delimiter=",", fmt="%d")


def main():
    """ ___ """
    row_count = 46
    col_count = 3408
    percentage_of_ones = get_percentages_of_ones(row_count)
    arr = create_array(percentage_of_ones, row_count, col_count)
    display_array(arr, col_count)
    save_to_csv("test.csv", arr)


if __name__ == "__main__":
    main()

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/485702.html

標籤：Python 熊猫数据框麻木的 CSV

上一篇：從陣列中讀取影像輸出無效的形狀錯誤

下一篇：在例如VTK中初始化三角形網格，基于具有頂點坐標的np陣列和具有三角形的np陣列