在PythonPandas中防止行迭代 -有解無憂

我已經將以下檔案轉換為pandas df:

https://www.fca.org.uk/publication/data/position-limits-contract-names-vpc.xlsx

我已經將相關的行（為我自己）轉換成了一個dict。這個dict的形式是{principal: [spot, aggregate, set(product codes)]}。我使用了下面的代碼將其轉換為這個dict：

ifeu_dict = defaultdict(lambda: [0, 0, set()] ) for (_, row) in df.iterrows()。 if row.loc["Venue MIC"/span>] == "IFEU"/span>: ifeu_dict[row.loc["Principal Venue Product Code"]][2].add(row.loc["Venue Product Codes"]) if type(row.loc["Spot month single limit#"] ) == int: # 不需要追加，因為默認是創建一個dict。 ifeu_dict[row.loc["Principal Venue Product Code"]][0] = row.loc["Spot month single limit#"] ifeu_dict[row.loc["Principal Venue Product Code"]][1] = row.loc["other month limit#"] if type(row.loc["Spot month single limit#"]) == str: try: val = int(str(row.loc["Spot month single limit#"]) 。 split()[0].replace(",", ")) val_2 = int(str(row.loc["other month limit#"])。 split()[0].replace(", ", "") ifeu_dict[row.loc["主會場產品代碼"]][0] = val ifeu_dict[row.loc["主會場產品代碼"]][1] = val_2 except ValueError: pass] = val_2 except ValueError.

然而，這真的很低效，所以我一直試圖改變我創建這個字典的方式。

一個嘗試是這樣的：

ifeu_dict_2 = defaultdict(lambda: [0, 0, set()])

ifeu_mask = df["Venue MIC"/span>] == "IFEU"/span>
ifeu_df = df.loc[ifeu_mask] 。
spot_mask_int = ifeu_df["Spot month single limit#"].apply(type) == int


def spot_transform（x）。
    try:
        return int（str（x）。 split()[0].replace(",", "))
    except ValueError。
        回傳。


ifeu_df["Spot month single limit#"] = ifeu_df.loc[~spot_mask_int, "Spot month single limit#"].apply(spot_transform)
ifeu_df["其他月份限制#"] = ifeu_df.loc[~spot_mask_int, "其他月份限制#"].apply(spot_transform)
spot_mask_int = ifeu_df["Spot month single limit#"].apply(type) == int。

然后嘗試：

temp_df = [~spot_mask_int, ["Principal Venue Product Code", "Spot month single limit#", "other month limit#"] ]
ifeu_dict_2[temp_df.loc["Principal Venue Product Code"]][0] = temp_df.loc["Spot Month Single Limit#"]

# this gives me AttributeError: 'list' object has no attribute 'loc'

或者：

ifeu_dict_2[ifeu_df. loc[spot_mask_int, "Principal Venue Product Code"]][2].add(ifeu_df.loc["Venguin Product Codes"])
ifeu_dict_2[ifeu_df.loc[spot_mask_int, "主要場館產品代碼"]][0] = ifeu_df.loc[spot_mask_int, "現貨月單一限額#"]
ifeu_dict_2[ifeu_df.loc[spot_mask_int, "主會場產品代碼"]][1] = ifeu_df.loc[spot_mask_int, "其他月份限制#"]

# this gives me TypeError: 'Series' objects are mutable, thus they cannot be hashed[/span>] 。

我被卡住了好一陣子，不知道該如何繼續。如果有任何幫助，無論是答案還是有用的鏈接，我們都將不勝感激！（關于鏈接，我是個新手，不知道該怎么做。(對于鏈接，我是編碼新手，所以目前例子對我幫助最大).

。

如果你想要一個df來玩：

Index(['Commodity Derivative Name
(包括相關合約)'。
       '場地MIC', '交易場地名稱', '場地產品代碼',
       '主會場產品代碼', '現貨月度單一限額#'。
       '其他月份限制#', '轉換系數', '計量單位'。
       '現貨月的定義']。
      dtype='object')

    API2鹿特丹煤炭平均價格期權（期貨式保證金）,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RCA,ATW,Aggregated with Principal,Aggregated with Principal,nan,Lots,calendar Month
 Gasoil Diff - Gasoil 50ppm FOB Rotterdam Barges vs Low Sulphur Gasoil 1st Line Future,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ULH,ULH,2500,2500,nan,Lots,日歷月
 船用燃料0.5%鹿特丹駁船離岸價(普氏) Future,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,MF3,MF3,2500,2500,nan,Lots,日歷月
API2鹿特丹煤炭（支持Cal 1x期權）,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATC,ATW,aggregated with principal,aggregated with principal,nan,Lots,Calendar Month
API2鹿特丹煤炭(支持季度1x期權),IFEU,洲際交易所 - ICE FUTURES EUROPE,ATQ,ATW,累計 有本金,累計 有本金,楠,手,日歷月
API2鹿特丹煤炭早盤1倍期權（期貨式保證金）,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATD,ATW,合計 有本金,合計 有本金,納,手,日歷月
API2鹿特丹煤炭提前(122天)單一到期期權(期貨式保證金),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RDE,ATW,聚合 with Principal,聚合 with Principal,nan,Lots,日歷月份
API2鹿特丹煤炭提前(214天)單一到期期權(期貨式保證金),IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RDF,ATW,aggregated with Principal,aggregated with Principal,nan,Lots,calendar Month
API2鹿特丹煤炭早期（305天）單一到期期權（期貨式保證金）,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE, RDG,ATW,合計與本金,合計與本金,nan,手數,日歷月
API2鹿特丹煤炭期貨,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATW,ATW,5,550 （24. 9%),38,800 (20.5%),nan, Lots, Calendar Month
API2鹿特丹煤炭期權（期貨式保證金）,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,RCO,ATW,aggregated with principal,aggregated with principal,nan,Lots,Calendar Month
API2鹿特丹煤炭第1季度期權（期貨式保證金）,IFEU,INTERCONTINENTAL EXCHANGE - ICE FUTURES EUROPE,ATH,ATW,累計 有本金,累計 有本金,納,手,日歷月

完成后的字典中的條目應該是這樣的：

ATW = [5550,  38800, {'ATH', 'ATC', 'RDF', 'ATQ'/span>, 'RCA'/span>, 'ATD'/span>, 'RCO'/span>, 'RDG'/span>, 'RDE'/span>, 'ATW'}]

uj5u.com熱心網友回復：

看了一下資料，我現在明白了。資料包括每個產品的多個代碼，你需要最終得到一個dict，它對每組代碼都有一個條目。你的方法是逐行處理，但更有效的方法是使用DataFrame.groupby方法并一次性處理每一組。

下面的代碼應該比逐行處理更有效率。

df_ifeu = df[df['Venue MIC' ]=='IFEU']

ifeu_dict = {}
for principal,g in df_ifeu.groupby('Principal Venue Product Code') 。
    # 查找產品代碼與主代碼相同的地方。
    pr = g['場地產品代碼'] == principal
    # 獲取主碼的數值 # 獲得主碼的數值
    spot_val = g.loc[pr, 'Spot month single limit#'].iloc[0]
    other_val = g.loc[pr, '其他月份限制#'].iloc[0]
    # get the codes].
    codes = set(g['Venue Product Codes' ])
    #將產品添加到dict中。
    ifeu_dict[principal] = [spot_val, other_val, codes] 。

# 確認我們有一個主產品代碼的dict條目。
assert(len(ifeu_dict)==df_ifeu['Principal Venue Product Code'/span>].nunique()

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/309428.html

標籤：

上一篇：strsplit是R中分離字串的最快方法嗎？

下一篇：提高在PM2集群上運行的NodeJS性能