我正在嘗試使用when條件更改列值。
條件運行良好,when列正在更新為“DJOUR”和“DECBA”的相應文字值,它們的長度為 5 個字符,但該when條件不適用于“DEP”、“BUY”等其他值, "WITH"... 長度小于 5 個字符。
為什么when條件無法識別較小長度的字串值?另外,如果我使用的when條件不正確,請糾正我。
df1 = df.withColumn("selling_type", when(col("selling_type") == "BUY", lit("ABCDEF1"))\
.when(col("selling_type") == "CJOUR", lit("ABCDEF2"))\
.when(col("selling_type") == "DEP", lit("ABCDEF3"))\ # not working
.when(col("selling_type") == "DIV", lit("ABCDEF4"))\ # not working
.when(col("selling_type") == "DJOUR", lit("ABCDEF5"))\ # working
.when(col("selling_type") == "DECBA", lit("ABCDEF6"))\ # working
.when(col("selling_type") == "MFEE", lit("ABCDEF7"))\
.when(col("selling_type") == "SELL", lit("ABCDEF8"))\
.when(col("selling_type") == "TEFRA", lit("ABCDEF9"))\
.when(col("selling_type") == "WITH", lit("ABCDEF10"))\
.otherwise(col("selling_type")))
uj5u.com熱心網友回復:
嘗試以下操作,它將修剪字串兩側所有可見和不可見的空白區域。此外,您可以使用更簡單的方法更改所有重復的when子句。replace
輸入資料框:
from pyspark.sql import functions as F
df = spark.createDataFrame([("BUY",), ("CJOUR",), ("DEP",), ("DIV",), ("DJOUR",), ("DECBA",), ("MFEE",), ("SELL",), ("TEFRA",), ("WITH",)], ["selling_type"])
腳本:
df = df.withColumn("selling_type", F.regexp_replace("selling_type", r"^\s |\s $", ""))
df = df.replace({
"BUY": "ABCDEF1",
"CJOUR": "ABCDEF2",
"DEP": "ABCDEF3",
"DIV": "ABCDEF4",
"DJOUR": "ABCDEF5",
"DECBA": "ABCDEF6",
"MFEE": "ABCDEF7",
"SELL": "ABCDEF8",
"TEFRA": "ABCDEF9",
"WITH": "ABCDEF10",
}, subset=["selling_type"])
df.show()
# ------------
# |selling_type|
# ------------
# | ABCDEF1|
# | ABCDEF2|
# | ABCDEF3|
# | ABCDEF4|
# | ABCDEF5|
# | ABCDEF6|
# | ABCDEF7|
# | ABCDEF8|
# | ABCDEF9|
# | ABCDEF10|
# ------------
uj5u.com熱心網友回復:
.when()確實適用于任意長度的字串。您能否檢查您的列是否真的包含“DEP”,開頭/結尾沒有任何空格等?
以下代碼適用于我:
>>> df = spark.createDataFrame(["CJOUR", "DEP"], StringType())
>>> df.printSchema()
root
|-- value: string (nullable = true)
>>> df = df.withColumnRenamed("value", "selling_type")
>>> from pyspark.sql.functions import when, col, lit
>>> df1 = df.withColumn("selling_type", trim(col("selling_type")).withColumn("selling_type", when(col("selling_type") == "BUY", lit("ABCDEF1"))
.when(col("selling_type") == "CJOUR", lit("ABCDEF2"))
.when(col("selling_type") == "DEP", lit("ABCDEF3"))
.when(col("selling_type") == "DIV", lit("ABCDEF4"))
.when(col("selling_type") == "DJOUR", lit("ABCDEF5"))
.when(col("selling_type") == "DECBA", lit("ABCDEF6"))
.when(col("selling_type") == "MFEE", lit("ABCDEF7"))
.when(col("selling_type") == "SELL", lit("ABCDEF8"))
.when(col("selling_type") == "TEFRA", lit("ABCDEF9"))
.when(col("selling_type") == "WITH", lit("ABCDEF10"))
.otherwise(col("selling_type")))
>>>
>>> df1.show()
------------
|selling_type|
------------
| ABCDEF2|
| ABCDEF3|
------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/509907.html
