我有一個包含 3408 行和 46 列的 CSV 檔案,我想用 0 和 1 隨機填充這些列中的每一列,但限制了數字 1 的外觀。例如,在 AI 列中有 3408 條記錄,但數字 1應該只出現在總行數的 15% 中,并且在每一列中,必須是我給出了“1”的出現百分比。
到目前為止我所做的是,我創建了一個 CSV 檔案,其中包含 3408 行和 46 列隨機填充“0”和“1”,沒有百分比,
任何幫助或建議都會很棒!
速度 = 15,49%
社區駕駛員注意力不集中或警覺性降低 = 13,81%
過馬路時行人疏忽 = 6.73%。
.
.
# Create 2D Numpy array of 3408 rows and 46 columns,
# filled with random values 0 and 1
random_data = np.random.randint(0,2,size=(3408,46))
# Create a Dataframe with random values
# using 2D numpy Array
df = pd.DataFrame(random_data, columns=['SPEEDING', 'Driver inattention or decreased alertness in neighborhoods',
'Pedestrian carelessness when crossing the road' , 'Unsafe overtaking', 'Loss of vehicle control' ,
'Refusal of priority' , 'Failure to maintain a safe distance' , 'Non-use of crosswalks' ,
'Playing on the road or walking on the side of the road' ,
'Dangerous maneuvers' , 'Drowsiness' , 'Driving without a license' ,
'Driver inattention when passing a motorcycle' , 'Non respect of the direction imposed to the traffic' ,
'Lane change without signalling' , 'Driving under the influence of alcohol or drugs' , 'Non respect of the road signs' ,
'Driver inattention when leaving the parking area' , 'Driver carelessness when reversing' , 'Traffic in the wrong direction' ,
'Non-respect of the stop sign' , 'Unsafe parking or stopping' , 'Dazzle from lights' ,
'Manual use of mobile phone/ Wearing a headset' , 'Pedestrian crossing the railroad track without precaution' ,
'Other Human Factors' , 'Defective tires (burst)' , 'Defective brakes' , 'Mechanical failures' , 'Defective steering system' ,
'Lack of lighting device' , 'Non-regulation lighting device' , 'Overload' , 'Other Vehicle condition Factors' , 'Weather' ,
'Defective road' , 'Animal crossing' , 'Lack of public lighting' , 'Slippery road surface' , 'Bad road design' , 'Potholes' ,
'Glare of the sun' , 'Obstacle on the road' , 'Deformed roadway' ,
'Other State of the road infrastructure and atmospheric conditions' , 'Fatality'])
# Display the Dataframe
print(df)
# Save the Dataframe to a csv file
df.to_csv('test.csv')
uj5u.com熱心網友回復:
您可以將所有內容設定為 0 并使用random.sample(請參閱 doc)獲取應設定為 1 的行索引。
因此,要將 3408 行中的 15% 設為 1(讓我們四舍五入到 511),您可以獲得以下串列:
from random import sample
sample(range(0, 3407), 511)
編輯:你可以在這個問題下找到替代品
uj5u.com熱心網友回復:
IIUC,嘗試這樣的事情:
1.我是如何生成我的df的:
df = pd.DataFrame(np.zeros(shape=(100, 1)), columns=['speeding'])
2.然后:
df['speeding'] = df['speeding'].apply(lambda x: np.random.choice([0,1], p=[0.85,0.15]))
3.檢查:
df.value_counts()
4.結果:
speeding
0 85
1 15
dtype: int64
uj5u.com熱心網友回復:
import numpy as np
import sys
def get_percentages_of_ones(row_count):
""" Create dummy percentage array. """
percentage_of_ones = np.zeros(row_count)
percentage_of_ones[0] = 0.15
percentage_of_ones[1] = 0.50
percentage_of_ones[2] = 1.0
return percentage_of_ones
def create_array(percentage_of_ones, row_count, col_count):
""" ___ """
arr = np.empty([row_count, col_count], dtype="int8")
for row_id, po1 in enumerate(percentage_of_ones):
nb_ones = int(round(po1 * col_count))
nb_zeros = col_count - nb_ones
row = np.append(
np.zeros(nb_zeros, dtype="int8"),
np.ones(nb_ones, dtype="int8")
)
np.random.shuffle(row)
arr[row_id] = row
return arr
def display_array(arr, col_count):
""" ___ """
np.set_printoptions(threshold=sys.maxsize,
edgeitems=col_count,
linewidth=95)
print(np.transpose(arr))
def save_to_csv(fname, data):
""" ___ """
np.savetxt(fname, data, delimiter=",", fmt="%d")
def main():
""" ___ """
row_count = 46
col_count = 3408
percentage_of_ones = get_percentages_of_ones(row_count)
arr = create_array(percentage_of_ones, row_count, col_count)
display_array(arr, col_count)
save_to_csv("test.csv", arr)
if __name__ == "__main__":
main()
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/485702.html
