引發InvalidIndexError(key)奇怪-有解無憂

我正在嘗試這個

    import dask.dataframe as dd
    import pandas as pd
    
    salary_df = pd.DataFrame({"Salary":[10000, 50000, 25000, 30000, 7000, 100000]})
    salary_category = pd.DataFrame({"Hi":[5000, 20000, 25000, 30000, 90000, 120000],
                            "Low":[0,  5001, 20001, 25001, 30001, 90001],
                            "category":["Very Poor", "Poor", "Medium", "Rich", "Super Rich", "Ultra Rich" ]
                            })
    sal_ddf = dd.from_pandas(salary_df, npartitions=10)
    salary_category.index = pd.IntervalIndex.from_arrays(salary_category['Low'],salary_category['Hi'],closed='both')
    sal_ddf['Category'] = sal_ddf['Salary'].map_partitions(lambda x : salary_category.iloc[salary_category.index.get_loc(x)]['category'], meta=('Category', 'str'))
    
    print(salary_category)
    print(sal_ddf.head())

我對 Salary_category 的輸出是

                         Hi    Low    category
    [0, 5000]          5000      0   Very Poor
    [5001, 20000]     20000   5001        Poor
    [20001, 25000]    25000  20001      Medium
    [25001, 30000]    30000  25001        Rich
    [30001, 90000]    90000  30001  Super Rich
    [90001, 120000]  120000  90001  Ultra Rich

不是 10000 會屬于“窮人”的范疇嗎？但我仍然收到這樣的索引錯誤

        sal_ddf['Category'] = sal_ddf['Salary'].map_partitions(lambda x : salary_category.iloc[salary_category.index.get_loc(x)]['category'], meta=('Category', 'str'))
      File "C:\Python\Python310\lib\site-packages\pandas\core\indexes\interval.py", line 613, in get_loc
        raise InvalidIndexError(key)
    pandas.errors.InvalidIndexError: 0    10000

為什么關鍵錯誤？

uj5u.com熱心網友回復：

Using.map_partitions假設傳遞了一個完整的資料幀，而上面的代碼將一個 dask 系列傳遞給它。這會導致問題。一種快速糾正方法是定義一個自定義函式并將其應用到.map_partitions：

sal_ddf = dd.from_pandas(salary_df, npartitions=10)
salary_category.index = pd.IntervalIndex.from_arrays(salary_category['Low'],salary_category['Hi'],closed='both')


def get_salary(df):
    df = df.copy()
    df['category'] = df['Salary'].apply(lambda x: salary_category.iloc[salary_category.index.get_loc(x)]['category'])
    return df

sal_ddf = sal_ddf.map_partitions(get_salary)

print(salary_category)
print(sal_ddf.compute())

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/350802.html

標籤：Python 熊猫达斯

上一篇：通過分隔符熊貓分割值

下一篇：如何根據條件在列的特定范圍行中插入值。熊貓