如何清理存盤在字典中的具有1到n個條件的熊貓資料框？-有解無憂

我想使用在類似 dict 的結構中定義的條件來清理 pandas 資料框。可能有 1 到 n 個條件。

我研究過使用 numpy.where() 但我不知道如何以編程方式創建所需的嵌套條件。

我怎樣才能達到我的目標？

這是一些示例代碼：

import pandas as pd
import numpy as np

d = [
    ["apple", "square", "green"],
    ["apple", "round", "blue"],
    ["orange", "long", "yellow"],
]

df = pd.DataFrame(d, columns=["fruit", "shape", "color"])

# conditions
# change fruit to "blueberry" when shape == "round" AND color == "blue"
# change fruit to "banana" when fruit == "orange" AND shape == "long" AND color == "yeelow"
# change shape to "round" when fruit == "apple" and color == "green"

print(df)

# this works but cannot be abstracted, or can it?
df["fruit"] = np.where(df["shape"] == "round",np.where(df["color"] == "blue","blueberry",df["fruit"],),df["fruit"],)

print(df)

# example for rules, alternatives for formatting are also welcome
rules = [
    [
        {
            "condition": [
                {
                    "shape": "round",
                    "color": "blue",
                }
            ],
            "result": [{"fruit": "blueberry"}],
        }
    ],
    [
        {
            "condition": [
                {
                    "fruit": "orange", 
                    "shape": "long"
                    }
                    ],
            "result": [{"fruit": "banana"}],
        }
    ],
    [
        {
            "condition": [
                {
                    "fruit": "apple", 
                    "color": "green"
                    }
                    ],
            "result": [{"shape": "round"}],
        }
    ],
]

uj5u.com熱心網友回復：

可以使用您的示例資料結構規則。下面的代碼顯示了如何。但是恕我直言，它列出了它們沒有增加價值的地方。我會選擇一個字典串列，沒有嵌入串列
已將其展平以從定義條件和結果的結構元組開始

[({'shape': 'round', 'color': 'blue'}, ('fruit', 'blueberry')),
 ({'fruit': 'orange', 'shape': 'long'}, ('fruit', 'banana')),
 ({'fruit': 'apple', 'color': 'green'}, ('shape', 'round'))]

然后是為pandas 查詢構建運算式字串的案例
過濾到已識別的行并將定義的列更新為定義的值loc[]

import pandas as pd
import numpy as np

d = [
    ["apple", "square", "green"],
    ["apple", "round", "blue"],
    ["orange", "long", "yellow"],
]

df = pd.DataFrame(d, columns=["fruit", "shape", "color"])

# example for rules, alternatives for formatting are also welcome
rules = [
    [
        {
            "condition": [
                {
                    "shape": "round",
                    "color": "blue",
                }
            ],
            "result": [{"fruit": "blueberry"}],
        }
    ],
    [
        {
            "condition": [{"fruit": "orange", "shape": "long"}],
            "result": [{"fruit": "banana"}],
        }
    ],
    [
        {
            "condition": [{"fruit": "apple", "color": "green"}],
            "result": [{"shape": "round"}],
        }
    ],
]

# flatten out all those nested lists in rules data structure
for cond, (col, val) in [
    (cond,)   tuple(res.items())
    for r in rules
    for u in r
    for cond, res in zip(u["condition"], u["result"])
]:
    df.loc[
        df.query(" & ".join([f"({c}=='{v}')" for c, v in cond.items()])).index,
        col,
    ] = val

df

	水果	形狀	顏色
0	蘋果	圓形的	綠色
1	藍莓	圓形的	藍色
2	香蕉	長	黃色的

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/453047.html

標籤：Python 熊猫麻木的

上一篇：將空格分隔的txt陣列讀入numpy

下一篇：有沒有辦法通過numpy對串列的關閉數字進行分組？