我有以下代碼和輸出。
import org.apache.spark.sql.functions.{collect_list, struct}
import sqlContext.implicits._
val df = Seq(
("john", "tomato", 1.99),
("john", "carrot", 0.45),
("bill", "apple", 0.99),
("john", "banana", 1.29),
("bill", "taco", 2.59)
).toDF("name", "food", "price")
df.groupBy($"name")
.agg(collect_list(struct($"food", $"price")).as("foods"))
.show(false)
df.printSchema
輸出和架構:
---- ---------------------------------------------
|name|foods |
---- ---------------------------------------------
|john|[[tomato,1.99], [carrot,0.45], [banana,1.29]]|
|bill|[[apple,0.99], [taco,2.59]] |
---- ---------------------------------------------
root
|-- name: string (nullable = true)
|-- foods: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- food: string (nullable = true)
| | |-- price: double (nullable = false)
我想根據 df("foods.price") > 1.00 進行過濾。我如何過濾它以獲得下面的輸出?
---- ---------------------------------------------
|name|foods |
---- ---------------------------------------------
|john|[[banana,1.29], [tomato,1.99]] |
|bill|[[[taco,2.59]] |
---- ---------------------------------------------
我試過了df.filter($"foods.food" > 1.00),但這不起作用,因為我遇到了錯誤。還有什么我可以嘗試的嗎?
uj5u.com熱心網友回復:
您正在嘗試對陣列應用過濾器,因此由于語法錯誤,它將引發錯誤。您可以在之前對價格應用過濾器,然后根據需要進行轉換。
val cf = df.filter("price > 1.0").groupBy($"name").agg(collect_list(struct($"food", $"price")).as("foods")

轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/427002.html
上一篇:將多列除以熊貓中的固定數字
