多串列理解vs單for回圈-有解無憂

我試圖了解在 python 中編碼的最佳實踐。我有一個 Pandas 資料框，我需要處理包含字串或浮點數的列，我正在做基本的資料管理，我想知道單個 for 回圈是否可能比許多串列理解更快。

在我的情況下，目標資料框是 400 萬行或更多行，我有 10 個串列理解，所以速度很重要，我必須決定是將它寫在 for 回圈還是許多串列理解中。你有什么建議嗎？

for i in range(dataframe.shape[0]):
        try: #Price dummy
            if dataframe["Price"].iloc[i]=="0":
                dataframe["Price_Dummy"].iloc[i] = 0
            else:
                dataframe["Price_Dummy"].iloc[i] = 1
        except:
            pass
        try: #Transform everything in MB (middle unit)
            unit_of_measure = dataframe["Size"].iloc[i].split(" ")[-1].lower()
            size = float(dataframe["Size"].iloc[i].split(" ")[0])
            if unit_of_measure =="kb":
                dataframe["Size"].iloc[i] = size/1000
            elif unit_of_measure =="gb":
                dataframe["Size"].iloc[i] = size*1000
            else:
                dataframe["Size"].iloc[i] = size
        except:
            pass

（其他 10 項操作）

對比

串列理解相同

我找到了這個鏈接：單串列迭代 vs 多串列理解

但這并不能說明串列推導式是否總是更快，而與考慮的迭代次數無關

uj5u.com熱心網友回復：

我會在沒有回圈的情況下嘗試使用np.whereif-elif-else 組合的子句。這通常很快。

import numpy as np

# dataframe is a DataFrame containing data
# Now this:

dataframe["Price"] = np.where(dataframe["Price_Dummy"] == "0",0,1)

# String operations work on whole string columns as well
unit_of_measure = dataframe["Size"].str.split(" ", expand=True)[1].lower()

size = dataframe["Size"].str.split(" ", expand=True)[0].astype("float")

kb_case = np.where(unit_of_measure =="kb", size/1000, size)
dataframe["Size"] = np.where(unit_of_measure =="gb", size*1000, kb_case)

請注意，我取代了[-1]在unit_of_measure =與線[1]作為expand=True選件不支持-1索引。所以你必須知道你的單位在哪個位置結束。

可以在此處找到有關在 DataFrame 中拆分字串的資訊。

在最后兩行中，我復制了您必須自下而上創建的 if-elif-else 組合：您的最終結果dataframe["Size"]等于size*1000單位是gb. 如果不是，則等于，kb_case其中包括單位的情況kb以及所有其他情況。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/360328.html

標籤：Python 熊猫表现

上一篇：高級JQ技術

下一篇：document.getElementById到inner.Text不起作用