我想提高以下代碼的速度。資料集是我想通過模擬各種引數進行壓力測驗的交易串列,并將所有結果存盤在表格中。
我執行此操作的方式是,通過設計引數范圍,然后迭代它們的值,啟動資料集的副本,將引數的值分配給新列,然后將所有內容連接到一個巨大的資料框中。
我想知道是否有人有一個好主意來避免三個 for 回圈來構建資料框?
'''
# Defining the range of parameters to simulate
volchange = range(-1,2)
spreadchange = range(-10,11)
flatchange = range(-10,11)
# the df where I store all the results
final_result = pd.DataFrame()
# Iterating over the range of parameters
for vol in volchange:
for spread in spreadchange:
for flat in flatchange:
# Creating a copy of the initial dataset, assigning the simulated values to three
# new columns and concat it with the rest, resulting in a dataframe which is
# several time the initial dataset with all the possible triplet of parameters
inter_pos = pos.copy()
inter_pos['vol_change[pts]'] = vol
inter_pos['spread_change[%]'] = spread
inter_pos['spot_change[%]'] = flat
final_result = pd.concat([final_result,inter_pos], axis = 0)
# Performing computation at dataframe level
final_result['sim_vol'] = final_result['vol_change[pts]'] final_result['ImpliedVolatility']
final_result['spread'].multiply(final_result['spread_change[%]'])/100
final_result['sim_spread'] = final_result['spread'] final_result['spread_change']
final_result['spot_change'] = final_result['spot'] * final_result['spot_change[%]']/100
final_result['sim_spot'] = final_result['spot'] final_result['spot_change']
final_result['sim_price'] = final_result['sim_spot'] - final_result['sim_spread']
'''
非常感謝你的幫助 !
祝您有個愉快的一周!
uj5u.com熱心網友回復:
將 pandas 資料幀相互連接需要很長時間。最好創建一個資料框串列,然后pd.concat一次將它們連接起來。
您可以像這樣自己測驗:
import pandas as pd
import numpy as np
from time import time
dfs = []
columns = [f"{i:02d}" for i in range(100)]
time_start = time()
for i in range(100):
data = np.random.random((10000, 100))
df = pd.DataFrame(columns=columns, data=data)
dfs.append(df)
new_df = pd.concat(dfs)
time_end = time()
print(f"Time elapsed: {time_end-time_start}")
# Time elapsed: 1.851675271987915
new_df = pd.DataFrame(columns=columns)
time_start = time()
for i in range(100):
data = np.random.random((10000, 100))
df = pd.DataFrame(columns=columns, data=data)
new_df = pd.concat([new_df, df])
time_end = time()
print(f"Time elapsed: {time_end-time_start}")
# Time elapsed: 12.258363008499146
您還可以使用itertools.product來擺脫嵌套的 for 回圈。
也正如@Ahmed AEK所建議的那樣:
您可以直接傳遞
data=itertools.product(volchange, spreadchange ,flatchange )給pd.DataFrame,并避免完全創建串列,這是一種記憶體效率更高、速度更快的方法
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/519850.html
標籤:Python表现
上一篇:帶有連接節點的鏈接的聚集氣泡
