在帶有 delimeter 的 csv 檔案中的資料下方|,我想將字串轉換為 MapPersonalInfo列資料,以便我可以提取所需的資訊。
我嘗試下面CSV轉換為實木復合地板格式String來Map使用演員我得到的資料型別不匹配錯誤。
以下是您的參考資料。非常感謝您的幫助。
Empcode EmpName PersonalInfo
1 abc """email"":""[email protected]"",""Location"":""India"",""Gender"":""Male"""
2 xyz """email"":""[email protected]"",""Location"":""US"""
3 pqr """email"":""[email protected]"",""Gender"":""Female"",""Location"":""Europe"",""Mobile"":""1234"""
謝謝
uj5u.com熱心網友回復:
一種簡單的方法是str_to_map在去掉PersonalInfo列中的雙引號后使用函式:
val df1 = df.withColumn(
"PersonalInfo",
expr("str_to_map(regexp_replace(PersonalInfo, '\"', ''))")
)
df1.show(false)
// ------- ------- ------------------------------------------------------------------------------
//|Empcode|EmpName|PersonalInfo |
// ------- ------- ------------------------------------------------------------------------------
//|1 |abc |{email -> [email protected], Location -> India, Gender -> Male} |
//|2 |xyz |{email -> [email protected], Location -> US} |
//|3 |pqr |{email -> [email protected], Gender -> Female, Location -> Europe, Mobile -> 1234}|
// ------- ------- ------------------------------------------------------------------------------
uj5u.com熱心網友回復:
如果要從PersonalInfo列創建映射,從 Spark 3.0 開始,您可以按以下步驟操作:
- 根據
"",""使用split功能拆分您的字串 - 對于獲得的字串陣列的每個元素,根據
"":""使用split函式創建子陣列 ""使用regexp_replace函式從子陣列的元素中洗掉所有元素- 使用
struct函式構建地圖條目 - 用于
map_from_entries從您的條目陣列構建地圖
完整代碼如下:
import org.apache.spark.sql.functions.{col, map_from_entries, regexp_replace, split, struct, transform}
val result = data.withColumn("PersonalInfo",
map_from_entries(
transform(
split(col("PersonalInfo"), "\"\",\"\""),
item => struct(
regexp_replace(split(item, "\"\":\"\"")(0), "\"\"", ""),
regexp_replace(split(item, "\"\":\"\"")(1), "\"\"", "")
)
)
)
)
具有以下內容input_dataframe:
------- ------- ---------------------------------------------------------------------------------------------
|Empcode|EmpName|PersonalInfo |
------- ------- ---------------------------------------------------------------------------------------------
|1 |abc |""email"":""[email protected]"",""Location"":""India"",""Gender"":""Male"" |
|2 |xyz |""email"":""[email protected]"",""Location"":""US"" |
|3 |pqr |""email"":""[email protected]"",""Gender"":""Female"",""Location"":""Europe"",""Mobile"":""1234""|
------- ------- ---------------------------------------------------------------------------------------------
你得到以下result資料框:
------- ------- ------------------------------------------------------------------------------
|Empcode|EmpName|PersonalInfo |
------- ------- ------------------------------------------------------------------------------
|1 |abc |{email -> [email protected], Location -> India, Gender -> Male} |
|2 |xyz |{email -> [email protected], Location -> US} |
|3 |pqr |{email -> [email protected], Gender -> Female, Location -> Europe, Mobile -> 1234}|
------- ------- ------------------------------------------------------------------------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/358557.html
標籤:斯卡拉 阿帕奇火花 火花 apache-spark-sql
