想要將整數資料型別列轉換為串列資料型別
給定資料幀
a b
0 9 2
1 9 3
想轉換成
a b
0 9 [2]
1 9 [3]
熊貓解決方案
import pandas as pd
df = pd.DataFrame({"a":[1,2],"b":[3,4]})
df["b"] = df["b"].apply(lambda row: [row])
如何在 pyspark 中實作相同的目標?
我嘗試了一種幼稚的方式
from pyspark.sql.types import IntegerType, ArrayType
from pyspark.sql.functions import col
df_sp = spark.createDataFrame(df)
#EDIT according to 過過招 Answer
df_sp = df_sp.withColumn("b",col("b").cast(ArrayType(IntegerType())))
display(df_sp)
這給出了錯誤 AnalysisException: cannot resolve 'b' due to data type mismatch: cannot cast bigint to array<int>;
uj5u.com熱心網友回復:
您需要element在array.
df_sp = df_sp.withColumn("b", col("b").cast(ArrayType(IntegerType())))
df_sp.show()
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/371449.html
