我想使用在類似 dict 的結構中定義的條件來清理 pandas 資料框。可能有 1 到 n 個條件。
我研究過使用 numpy.where() 但我不知道如何以編程方式創建所需的嵌套條件。
我怎樣才能達到我的目標?
這是一些示例代碼:
import pandas as pd
import numpy as np
d = [
["apple", "square", "green"],
["apple", "round", "blue"],
["orange", "long", "yellow"],
]
df = pd.DataFrame(d, columns=["fruit", "shape", "color"])
# conditions
# change fruit to "blueberry" when shape == "round" AND color == "blue"
# change fruit to "banana" when fruit == "orange" AND shape == "long" AND color == "yeelow"
# change shape to "round" when fruit == "apple" and color == "green"
print(df)
# this works but cannot be abstracted, or can it?
df["fruit"] = np.where(df["shape"] == "round",np.where(df["color"] == "blue","blueberry",df["fruit"],),df["fruit"],)
print(df)
# example for rules, alternatives for formatting are also welcome
rules = [
[
{
"condition": [
{
"shape": "round",
"color": "blue",
}
],
"result": [{"fruit": "blueberry"}],
}
],
[
{
"condition": [
{
"fruit": "orange",
"shape": "long"
}
],
"result": [{"fruit": "banana"}],
}
],
[
{
"condition": [
{
"fruit": "apple",
"color": "green"
}
],
"result": [{"shape": "round"}],
}
],
]
uj5u.com熱心網友回復:
- 可以使用您的示例資料結構規則。下面的代碼顯示了如何。但是恕我直言,它列出了它們沒有增加價值的地方。我會選擇一個字典串列,沒有嵌入串列
- 已將其展平以從定義條件和結果的結構元組開始
[({'shape': 'round', 'color': 'blue'}, ('fruit', 'blueberry')),
({'fruit': 'orange', 'shape': 'long'}, ('fruit', 'banana')),
({'fruit': 'apple', 'color': 'green'}, ('shape', 'round'))]
- 然后是為pandas 查詢構建運算式字串的案例
- 過濾到已識別的行并將定義的列更新為定義的值
loc[]
import pandas as pd
import numpy as np
d = [
["apple", "square", "green"],
["apple", "round", "blue"],
["orange", "long", "yellow"],
]
df = pd.DataFrame(d, columns=["fruit", "shape", "color"])
# example for rules, alternatives for formatting are also welcome
rules = [
[
{
"condition": [
{
"shape": "round",
"color": "blue",
}
],
"result": [{"fruit": "blueberry"}],
}
],
[
{
"condition": [{"fruit": "orange", "shape": "long"}],
"result": [{"fruit": "banana"}],
}
],
[
{
"condition": [{"fruit": "apple", "color": "green"}],
"result": [{"shape": "round"}],
}
],
]
# flatten out all those nested lists in rules data structure
for cond, (col, val) in [
(cond,) tuple(res.items())
for r in rules
for u in r
for cond, res in zip(u["condition"], u["result"])
]:
df.loc[
df.query(" & ".join([f"({c}=='{v}')" for c, v in cond.items()])).index,
col,
] = val
df
| 水果 | 形狀 | 顏色 | |
|---|---|---|---|
| 0 | 蘋果 | 圓形的 | 綠色 |
| 1 | 藍莓 | 圓形的 | 藍色 |
| 2 | 香蕉 | 長 | 黃色的 |
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/453047.html
