我在資料框中有一列 ArrayType 型別。現在,在此列中,我有空串列。例如:
-------
|xyz|
-------
|[a,,] |
|[] |
-------
我想從串列中洗掉空值并將列輸出為:
-------
|xyz|
-------
|[a] |
-------
加入本專欄時如何實作這一目標?謝謝你。
uj5u.com熱心網友回復:
您可以使用map和filter來實作這一點。
import spark.implicits._
import org.apache.spark.sql.Row
import scala.collection.mutable.WrappedArray
import org.apache.spark.sql.functions.col
val data = Seq(Array("a",null,""), Array(""))
val rdd = spark.sparkContext.parallelize(data)
val df = rdd.toDF("xyz")
df.show()
------
| xyz|
------
|[a,, ]|
| []|
------
// Use map to filter out all the null or empty strings, then remove rows that are empty arrays
val mappedDF = df
.map{case Row(x:WrappedArray[String]) => x
.filter(_ != null)
.filter(_.nonEmpty)
}
.toDF("xyz")
.filter(size(col("xyz")) > 0)
mappedDF.show()
---
|xyz|
---
|[a]|
---
uj5u.com熱心網友回復:
嘗試:
import org.apache.spark.sql.functions.{size, col, expr}
df.withColumn("xyz", expr("filter(xyz, x -> x is not null)"))
.filter(size(col("xyz")) > 0)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/334503.html
