我有以下資料:
val df = Seq(
(1, List("A")),
(2, List("A")),
(3, List("A", "B")),
(4, List("C")),
(5, List("A")),
(6, List("A", "C")),
(7, List("B")),
(8, List("A", "B", "C")),
(9, List("A"))
).toDF("Serial Number", "my_list")
-------------------- --------------------
| Serial Number| my_list|
-------------------- --------------------
| 1| [A]|
| 2| [A]|
| 3| [A,B]|
| 4| [C]|
| 5| [A]|
| 6| [A, C]|
| 7| [B]|
| 8| [A, B, C]|
| 9| [A]|
-------------------- --------------------
我有一張地圖
val category_Mapping = Map("Category1" -> [A, B],
"Category2" -> [C],
"Category3" -> [B, D])
我想在 data["my_list"] 中查找每個串列元素,并通過以下方式為每個 data["Serial Number"] 回傳一個輸出映射:
-------------------- -------------------- ------------------------------------------
| Serial Number| my_list| output |
-------------------- -------------------- ------------------------------------------
| 1| [A]|{Category1->1, Category2->0, Category3->0}|
| 2| [A]|{Category1->1, Category2->0, Category3->0}|
| 3| [A,B]|{Category1->1, Category2->0, Category3->1}|
| 4| [C]|{Category1->0, Category2->1, Category3->0}|
| 5| [A]|{Category1->1, Category2->0, Category3->0}|
| 6| [A, C]|{Category1->1, Category2->1, Category3->0}|
| 7| [B]|{Category1->1, Category2->0, Category3->1}|
| 8| [A, B, C]|{Category1->1, Category2->1, Category3->1}|
| 9| [A]|{Category1->1, Category2->0, Category3->0}|
-------------------- -------------------- ------------------------------------------
基本上,如果 data["my_list"] 中的串列中的元素存在于 category_Mapping 中,我想回傳一個值為 1 的輸出映射。無論如何我可以做到這一點?
編輯:大約 5 小時,沒有人回答。有人可以幫我嗎?
uj5u.com熱心網友回復:
你可以試試這個
,我在 spark 本地模式下這樣做了,而不是在集群上
// Assuming that your dataframe is stored in a variable called df
// Define a function which will return your map based on the given array in the colum n 'my_list'
def function(lst: mutable.WrappedArray[String]): Map[String, Int] = {
var map: scala.collection.mutable.Map[String, Int] = scala.collection.mutable.Map("Category1" -> 0, "Category2" -> 0, "Category3" -> 0)
lst.foreach { l =>
map.keys.foreach { key =>
if (Map("Category1" -> Array("A", "B"), "Category2" -> Array("C"), "Category3" -> Array("B", "D"))(key).contains(l))
map(key) = 1
}
}
map.toMap
}
// now you can define a udf which will just call the above defined function
val output = udf { (lst: mutable.WrappedArray[String]) => {
function(lst)
}
}
// now you can call the udf on the column 'my_list'
df.withColumn("output", output(col("my_list"))).show(false)
// The output will be as given below
------------- --------- ------------------------------------------------
|Serial Number|my_list |output |
------------- --------- ------------------------------------------------
|1 |[A] |[Category2 -> 0, Category1 -> 1, Category3 -> 0]|
|2 |[A] |[Category2 -> 0, Category1 -> 1, Category3 -> 0]|
|3 |[A, B] |[Category2 -> 0, Category1 -> 1, Category3 -> 1]|
|4 |[C] |[Category2 -> 1, Category1 -> 0, Category3 -> 0]|
|5 |[A] |[Category2 -> 0, Category1 -> 1, Category3 -> 0]|
|6 |[A, C] |[Category2 -> 1, Category1 -> 1, Category3 -> 0]|
|7 |[B] |[Category2 -> 0, Category1 -> 1, Category3 -> 1]|
|8 |[A, B, C]|[Category2 -> 1, Category1 -> 1, Category3 -> 1]|
|9 |[A] |[Category2 -> 0, Category1 -> 1, Category3 -> 0]|
------------- --------- ------------------------------------------------
要根據 category_Mapping 在輸出列中獲取映射的鍵,我們可以將 category_Mapping 變數作為引數傳遞給 udf,并在函式中使用它來動態定義輸出映射。可以按如下方式完成:
val category_Mapping = Map("Category1" -> Array("A", "B"), "Category2" -> Array("C"), "Category3" -> Array("B", "D"))
def function(lst: mutable.WrappedArray[String], category_Mapping: Map[String, Array[String]]): Map[String, Int] = {
var map: scala.collection.mutable.Map[String, Int] = scala.collection.mutable.Map()
lst.foreach { l =>
category_Mapping.keys.foreach { key =>
if(!map.contains(key))
map(key) = 0
if (category_Mapping(key).contains(l))
map(key) = 1
}
}
map.toMap
}
// the definition of udf has changed in this case.
def output (category_Mapping: Map[String, Array[String]]) = udf { (lst: mutable.WrappedArray[String]) => {
function(lst,category_Mapping)
}
}
df.withColumn("output", output(category_Mapping)(col("my_list"))).show(false)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/432896.html
