我有pd.DataFrame包含值的行:
import pandas as pd
df = pd.DataFrame({"col1": [1, 2, 3, 4, 5, 6], "col2": [6, 5, 4, 3, 2, 1]})
我現在想找到一種np.array基于應用于兩列的函式的輸出創建矩陣的有效方法:
def my_function(x1, x2, y1, y2):
return x1 > y1 and x2 < y2
解決這個問題的簡單 O(N2) 方法如下:
matrix = []
for _, (x1, x2) in df.iterrows():
row = []
for _, (y1, y2) in df.iterrows():
row.append(my_function(x1, x2, y1, y2))
matrix.append(row)
給我們:
>>> print(np.array(matrix))
array([[False, False, False, False, False, False],
[ True, False, False, False, False, False],
[ True, True, False, False, False, False],
[ True, True, True, False, False, False],
[ True, True, True, True, False, False],
[ True, True, True, True, True, False]])
是否有更有效的方法可以擴展到更多值?
uj5u.com熱心網友回復:
你可以試試np.vectorize
def my_function(x, y):
x1, x2 = x
y1, y2 = y
return x1 > y1 and x2 < y2
arr = df.to_records(index=False)
f_vfunc = np.vectorize(my_function)
r = f_vfunc(arr[:, None], arr)
print(r)
[[False False False False False False]
[ True False False False False False]
[ True True False False False False]
[ True True True False False False]
[ True True True True False False]
[ True True True True True False]]
uj5u.com熱心網友回復:
numpy.vectorize這里不需要,您可以直接輕松地撰寫矢量代碼(并且vectorize不會提高速度,它充當回圈):
a = df['col1'].to_numpy()
b = df['col2'].to_numpy()
matrix = (a[:,None]>a)&(b[:,None]<b)
輸出:
array([[False, False, False, False, False, False],
[ True, False, False, False, False, False],
[ True, True, False, False, False, False],
[ True, True, True, False, False, False],
[ True, True, True, True, False, False],
[ True, True, True, True, True, False]])
速度對比:
%%timeit
f_vfunc(arr[:, None], arr)
37.2 μs ± 256 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
(a[:,None]>a)&(b[:,None]<b)
2.44 μs ± 84.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/471441.html
