我有以下代碼
from pyspark.sql.functions import col, count, when
from functools import reduce
df = spark.createDataFrame([ (1,""), (2,None),(3,"c"),(4,"d") ], ['id','name'])
filter1 = col("name").isNull()
filter2 = col("name") == ""
dfresult = df.filter(filter1 | filter2).select(col("id"), when(filter1, "name is null").when(filter2, "name is empty").alias("new_col"))
dfresult.show()
--- -------------
| id| new_col|
--- -------------
| 1|name is empty|
| 2| name is null|
--- -------------
在具有 N 個過濾器的場景中。我想想
filters = []
filters.append({ "item": filter1, "msg":"name is null"})
filters.append({ "item": filter2, "msg":"name is empty"})
dynamic_filter = reduce(
lambda x,y: x | y,
[s['item'] for s in filters]
)
df2 = df.filter(dynamic_filter).select(col("id"), when(filter1, "name is null").when(filter2, "name is empty").alias("new_col"))
df2.show()
new_col我怎樣才能使動態列更好when?
uj5u.com熱心網友回復:
只需functools.reduce像您對過濾器運算式所做的那樣使用:
from functools import reduce
from pyspark.sql import functions as F
new_col = reduce(
lambda acc, x: acc.when(x["item"], F.lit(x["msg"])),
filters,
F
)
df2 = df.filter(dynamic_filter).select(col("id"), new_col.alias("new_col"))
df2.show()
# --- -------------
#| id| new_col|
# --- -------------
#| 1|name is empty|
#| 2| name is null|
# --- -------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/416903.html
標籤:
