我分解了一列并得到如下資料框:
------------ ----------- --------------------
|serialnumber| roomname| devices|
------------ ----------- --------------------
|hello |Living Room| device1|
|hello |Living Room| device2|
|hello |Living Room| device3|
|hello |Living Room| device4|
|hello |Living Room| device5|
|hello | Kitchen| device1|
|hello | Kitchen| device2|
|hello | Kitchen| device3|
|hello | Kitchen| device4|
|hello | Kitchen| device5|
|hello | Bedroom1| device1|
|hello | Bedroom1| device2|
|hello | Bedroom1| device3|
|hello | Bedroom1| device4|
|hello | Bedroom1| device5|
|hello | Bedroom 2| device1|
|hello | Bedroom 2| device2|
|hello | Bedroom 2| device3|
|hello | Bedroom 2| device4|
|hello | Bedroom 2| device5|
|hello | Bedroom3| device1|
|hello | Bedroom3| device2|
|hello | Bedroom3| device3|
|hello | Bedroom3| device4|
|hello | Bedroom3| device5|
------------ ----------- --------------------
現在我想要一個如下的資料框,這意味著客廳的第一個,廚房的第二個,臥室的第三個等等......
------------ ----------- --------------------
|serialnumber| roomname| devices|
------------ ----------- --------------------
|hello |Living Room| device1|
|hello | Kitchen| device2|
|hello | Bedroom1| device3|
|hello | Bedroom 2| device4|
|hello | Bedroom 3| device5|
------------ ----------- --------------------
uj5u.com熱心網友回復:
這是您如何使用groupBy和window運行它的方法,但您需要知道column.
import org.apache.spark.sql.functions._
val window = Window.partitionBy("serialnumber").orderBy("roomname")
df.groupBy("serialnumber", "roomname")
.agg(collect_list("devices").as("devices"))
.withColumn("index", rank().over(window))
.withColumn("devices", element_at($"devices", $"index"))
.drop("index")
.show(false)
輸出:
------------ ----------- -------
|serialnumber|roomname |devices|
------------ ----------- -------
|hello |Bedroom 2 |device1|
|hello |Bedroom1 |device2|
|hello |Bedroom3 |device3|
|hello |Kitchen |device4|
|hello |Living Room|device5|
------------ ----------- -------
uj5u.com熱心網友回復:
據我了解,您遇到的問題是您在使用爆炸時丟失了房間名的順序。
假設 roomname 的型別是 Array[..],explode你可以使用而不是使用posexplode
val df = Seq(
("hello", List[String]("room1", "room2")),
("hello1", List[String]("room1", "room2"))
).toDF("serial", "roomname")
df.select(posexplode($"roomname")).show()
會給你以下輸出
--- -----
|pos| col|
--- -----
| 0|room1|
| 1|room2|
| 0|room1|
| 1|room2|
--- -----
然后,您可以根據需要通過添加 filter
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/413847.html
標籤:
