我在 PySpark 資料框中有一列Date和一Hour列。如何將這些合并在一起以獲得Desired_Calculated_Result列?
df1 = sqlContext.createDataFrame(
[
('2021-10-20','1300', '2021-10-20 13:00:00.000 0000')
,('2021-10-20','1400', '2021-10-20 14:00:00.000 0000')
,('2021-10-20','1500', '2021-10-20 15:00:00.000 0000')
]
,['Date', 'Hour', 'Desired_Calculated_Result']
)
我也試過:
df1.withColumn("TimeStamp", unix_timestamp(concat_ws(" ", df1.Date, df1.Hour), "yyyy-MM-dd HHmm").cast("timestamp")).show().
這將回傳時間戳列中的所有空值
uj5u.com熱心網友回復:
from pyspark.sql.functions import concat, unix_timestamp
df1\
.withColumn("TimeStamp", unix_timestamp(concat(df1.Date, df1.Hour), "yyyy-MM-ddHHmm")\
.cast("timestamp"))\
.show()
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/332707.html
上一篇:使用膩子找不到目錄火花
