需要明確的是,以下是我正在嘗試做的事情。問題是,如何更改函式,oper_AB()以便代替嵌套的 for 回圈,我使用 numpy 中的矢量化/廣播并獲得ret_list更快的速度?
def oper(a_1D, b_1D):
return np.dot(a_1D, b_1D) / np.dot(b_1D, b_1D)
def oper_AB(A_2D, B_2D):
ret_list = []
for a_1D in A_2D:
for b_1D in B_2D:
ret_list.append(oper(a_1D, b_1D))
return ret_list
uj5u.com熱心網友回復:
這應該有效。
result = (np.matmul(A_2D, B_2D.transpose())/np.sum(B_2D*B_2D,axis=1)).flatten()
但是由于快取利用率的原因,第二個實作會更快。
def oper_AB(A_2D, B_2D):
b_squared = np.sum(B_2D*B_2D,axis=1).reshape([-1,1])
b_normalized = B_2D/b_squared
del b_squared
returned_val = np.matmul(A_2D,b_normalized.transpose())
return returned_val.flatten()
的del是那里只是如果B_2D分配的記憶體過大,(或者它只是我用來與多個GB陣列作業)
編輯:根據 A_1D - B_1D 的要求
def oper2_AB(A_2D, B_2D):
output = np.zeros([A_2D.shape[0]*B_2D.shape[0],A_2D.shape[1]],dtype=A_2D.dtype)
for i in range(len(A_2D)):
output[i*len(B_2D):(i 1)*len(B_2D)] = A_2D[i]-B_2D
return output
uj5u.com熱心網友回復:
嚴格解決這個問題(我懷疑 OP 想要規范,而不是規范的平方,如下面的除數):
r = a @ b.T / np.linalg.norm(b, axis=1)**2
例子:
np.random.seed(0)
a = np.random.randint(0, 10, size=(2,2))
b = np.random.randint(0, 10, size=(2,2))
然后:
>>> a
array([[5, 0],
[3, 3]])
>>> b
array([[7, 9],
[3, 5]])
>>> oper_AB(a, b)
[0.2692307692307692,
0.4411764705882353,
0.36923076923076925,
0.7058823529411765]
>>> a @ b.T / np.linalg.norm(b, axis=1)**2
array([[0.26923077, 0.44117647],
[0.36923077, 0.70588235]])
>>> np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
array([0.26923077, 0.44117647, 0.36923077, 0.70588235])
速度:
n, m = 1000, 100
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))
orig = %timeit -o oper_AB(a, b)
# 2.73 s ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new = %timeit -o np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
# 2.22 ms ± 33.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
orig.average / new.average
# 1228.78 (speedup)
我們的解決方案比原始解決方案快 1200 倍。
正確性:
>>> np.allclose(np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2), oper_AB(a, b))
True
大型陣列的速度,與@Ahmed AEK 的解決方案相比:
n, m = 2000, 2000
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))
new = %timeit -o np.ravel(a @ b.T / np.linalg.norm(b, axis=1)**2)
# 86.5 ms ± 484 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
other = %timeit -o AEK(a, b) # Ahmed AEK's answer
# 102 ms ± 379 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
我們的解決方案快了 15% :-)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/384716.html
