我正在嘗試為我的作業構建機器學習演算法。我用于訓練和測驗的資料有 17k 行和 20 列。我已經嘗試基于另外兩列添加一個新列,但是我撰寫的 for 回圈太慢(執行 3 秒)
for i in range(0, len(model_olculeri)):
if (model_olculeri["Bel"][i] != 0) and (model_olculeri["Basen"][i] != 0):
sum_column = (model_olculeri["Bel"][i]) / (model_olculeri["Basen"][i])
model_olculeri["Waist to Hip Ratio"][i] = sum_column
我閱讀了有關 Pandas 和 numpy 矢量化而不是 Pandas 資料幀上的 for 回圈的文章,它似乎更快更有效。如何為我的 for 回圈實作這種矢量化?非常感謝。
uj5u.com熱心網友回復:
創建布爾掩碼并將其用于過濾:
m = (model_olculeri["Bel"] != 0) & (model_olculeri["Basen"] != 0)
model_olculeri.loc[m,"Waist to Hip Ratio"] = model_olculeri.loc[m, "Bel"] / model_olculeri.loc[m,"Basen"]
選擇:
model_olculeri.loc[m,"Waist to Hip Ratio"] = model_olculeri["Bel"] / model_olculeri["Basen"]
或在 中設定新值numpy.where:
model_olculeri["Waist to Hip Ratio"] = np.where(m, model_olculeri["Bel"] / model_olculeri["Basen"], np.nan)
uj5u.com熱心網友回復:
使用query和的鏈式解決方案pipe
model_olculeri.query("Bel != 0 & Basen != 0").pipe(lambda x:x.assign(Waist to Hip Ratio = x.Bel/x.Basen)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/338046.html
