Pandaspd.apply函式與python快取一起作業不能被散列"-有解無憂

我有一個 df，你可以通過運行以下代碼來獲得它：

import pandas as pd
from io import StringIO
from functools import lru_cache

df = """
  contract      EndDate     
  A00118        123456
  A00118        12345   
"""
df = pd.read_csv(StringIO(df.strip()), sep='\s ')

輸出是：

    contract    EndDate
0   A00118     123456
1   A00118     12345

然后我對每一行應用一個邏輯：

def var_func(row,n):
    res=row['EndDate']*100*n
    return res

df['annfact'] = df.apply(lambda row: var_func(row,10), axis=1)

輸出是：

    contract    EndDate annfact
0   A00118     123456   123456000
1   A00118     12345    12345000

但是，如果我在此函式上應用 python lru_cache：

@lru_cache(maxsize = None)
def var_func(row,n):
    res=row['EndDate']*100*n
    return res

df['annfact'] = df.apply(lambda row: var_func(row,10), axis=1)

錯誤：

TypeError: ("'Series' objects are mutable, thus they cannot be hashed", 'occurred at index 0')

有朋友可以幫忙嗎？我想將 python lru_cache 應用于 pd.apply 函式。由于某種原因，我必須只使用 pd.apply 函式，而不是矢量化 numpy 方法。

uj5u.com熱心網友回復：

從檔案：

由于字典用于快取結果，因此函式的位置和關鍵字引數必須是可散列的。

使用df.apply(..., axis=1)，您正在傳遞一個不可散列的行（這是一個 Series 物件），因此您會收到錯誤訊息。

解決此問題的一種方法是var_func在列上應用：

@lru_cache(maxsize = None)
def var_func(row, n):
    return row*100*n

df['annfact'] = df['EndDate'].apply(var_func, n=10)

對于您的具體示例，最好使用矢量化操作：

df['annfact'] = df['EndDate']*100*n

我們還可以將每一行轉換為可散列的東西。由于您想繼續參考列名，我們可以使用collections.namedtuple：

@lru_cache(maxsize = None)
def var_func(row, n):
    res=row.EndDate*100*n
    return res

from collections import namedtuple
df_as_ntup = namedtuple('df_as_ntup', df.columns)
df['annfact'] = df.apply(lambda row: var_func(df_as_ntup(*row), 10), axis=1)

輸出：

  contract  EndDate    annfact
0   A00118   123456  123456000
1   A00118    12345   12345000

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/454626.html

標籤：Python python-3.x 熊猫数据框麻木的

上一篇：復制行并增加沒有游標/回圈的版本列

下一篇：從CSV檔案中讀取資料會產生TypeError