我想對資料幀中的每一行執行每一行的操作。顯而易見的方法是使用嵌套的 for 回圈,這預計會非常慢。
尋求有關更快更好的方法來實作相同目標的建議?
This is dataframe where each row is a user vector, with index set as usernames. In actual there can be hundreds of usernames
import pandas as pd
df1 = pd.DataFrame({"A":[11,2,3], "B":[4,5,6], "C":[7,8,9]}, index=["U1","U2", "U3"])
Nested Loop Method
import numpy as np
def some_func(u1_vec,u2_vec):
# this could be any function using above 2 user vectors
return np.minimum(u1_vec, u2_vec).sum()/np.maximum(u1_vec, u2_vec).sum()
index_list = list(df1.index) # contains usernames
vector_cols = list(df1.columns) # contains colnames
min_max_all = {} # will be used to store the vector interaction
for index_u1 in index_list:
u1_vec = df1.loc[index_u1, vector_cols]
min_max_all[index_u1] = {}
for index_u2 in index_list:
u2_vec = df1.loc[index_u2, vector_cols]
min_max_all[index_u1][index_u2] = some_func(u1_vec, u2_vec)
Result - min_max_all
{
'U1': {'U1': 1.0, 'U2': 0.5416666666666666, 'U3': 0.5384615384615384},
'U2': {'U1': 0.5416666666666666, 'U2': 1.0, 'U3': 0.8333333333333334},
'U3': {'U1': 0.5384615384615384, 'U2': 0.8333333333333334, 'U3': 1.0}
}
uj5u.com熱心網友回復:
我認為最好的方法是使用 numpy,并為一個目的撰寫一個代碼。
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"A":[11,2,3], "B":[4,5,6], "C":[7,8,9]}, index=["U1","U2", "U3"])
df1_np = df1.to_numpy()
x = np.minimum(df1_np[:, np.newaxis], df1_np).sum(axis=2)
y = np.maximum(df1_np[:, np.newaxis], df1_np).sum(axis=2)
print(x/y)
array([[1. , 0.54166667, 0.53846154],
[0.54166667, 1. , 0.83333333],
[0.53846154, 0.83333333, 1. ]])
在問題中制作像您一樣的字典
z = x/y
{ci: {cj: z[i][j] for j, cj in enumerate(df1.columns)}
for i, ci in enumerate(df1.columns)}
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/433561.html
上一篇:使用分隔符將一列拆分為多列
