我有一列是識別符號串列(在本例中為跑道)。它可以是陣列或逗號分隔的串列,在本例中我將其轉換為陣列。我試圖找出基于所述陣列的內容更新一組列的慣用/編程方式。
使用反模式的作業示例:
val data = Seq("08L,08R,09")
val df = data.toDF("runways")
.withColumn("runway_set", split('runways, ","))
.withColumn("runway_in_use_08L", when(array_contains('runway_set, "08L"), 1).otherwise(0))
.withColumn("runway_in_use_26R", when(array_contains('runway_set, "26R"), 1).otherwise(0))
.withColumn("runway_in_use_08R", when(array_contains('runway_set, "08R"), 1).otherwise(0))
.withColumn("runway_in_use_26L", when(array_contains('runway_set, "26L"), 1).otherwise(0))
.withColumn("runway_in_use_09", when(array_contains('runway_set, "09"), 1).otherwise(0))
.withColumn("runway_in_use_27", when(array_contains('runway_set, "27"), 1).otherwise(0))
.withColumn("runway_in_use_15L", when(array_contains('runway_set, "15L"), 1).otherwise(0))
.withColumn("runway_in_use_33R", when(array_contains('runway_set, "33R"), 1).otherwise(0))
.withColumn("runway_in_use_15R", when(array_contains('runway_set, "15R"), 1).otherwise(0))
.withColumn("runway_in_use_33L", when(array_contains('runway_set, "33L"), 1).otherwise(0))
這基本上會產生一個熱編碼值,如下所示:
---------- -------------- ----------------- ----------------- ----------------- ----------------- ---------------- ---------------- ----------------- ----------------- ----------------- -----------------
| runways| runway_set|runway_in_use_08L|runway_in_use_26R|runway_in_use_08R|runway_in_use_26L|runway_in_use_09|runway_in_use_27|runway_in_use_15L|runway_in_use_33R|runway_in_use_15R|runway_in_use_33L|
---------- -------------- ----------------- ----------------- ----------------- ----------------- ---------------- ---------------- ----------------- ----------------- ----------------- -----------------
|08L,08R,09|[08L, 08R, 09]| 1| 0| 1| 0| 1| 0| 0| 0| 0| 0|
---------- -------------- ----------------- ----------------- ----------------- ----------------- ---------------- ---------------- ----------------- ----------------- ----------------- -----------------
感覺我應該能夠采用所有識別符號的靜態序列并執行一些編程操作以在回圈/映射/foreach 型別的運算式中完成上述所有操作,但我不確定如何制定它。
例如:
val all_runways = Seq("08L","26R","08R","26L","09","27","15L","33R","15R","33L")
// iterate through and update each column, e.g. 'runway_in_use_$i'
任何指標?提前致謝。
uj5u.com熱心網友回復:
fold的典型用例。
val df = data.toDF("runways")
.withColumn("runway_set", split('runways, ","))
val df2 = all_runways.foldLeft(df) { (acc, x) =>
acc.withColumn(s"runway_in_use_$x", when(array_contains('runway_set, x), 1).otherwise(0))
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/440172.html
