我需要對 datetime 進行減法以獲得經過時間的列。我能夠將單獨的日期和時間列組合成兩個組合列,稱為上車和下車。但是,我無法成功地將這些列放入日期時間型別列。下面,'pickup' 和 'dropoff' 是字串。有沒有辦法讓這些列變成日期時間型別?
我一直在掙扎,因為這不包括上午/下午。pyspark 資料框如下所示。謝謝!
df.show()
----------- ----------- ------------ ------------ -------- ---- ----- ------------- -------------
|pickup_date|pickup_time|dropoff_date|dropoff_time|distance| tip| fare| pickup| dropoff|
----------- ----------- ------------ ------------ -------- ---- ----- ------------- -------------
| 1/1/2017| 0:00| 1/1/2017| 0:00| 0.02| 0| 52.8|1/1/2017 0:00|1/1/2017 0:00|
| 1/1/2017| 0:00| 1/1/2017| 0:03| 0.5| 0| 5.3|1/1/2017 0:00|1/1/2017 0:03|
| 1/1/2017| 0:00| 1/1/2017| 0:39| 7.75|4.66|27.96|1/1/2017 0:00|1/1/2017 0:39|
| 1/1/2017| 0:00| 1/1/2017| 0:06| 0.8|1.45| 8.75|1/1/2017 0:00|1/1/2017 0:06|
| 1/1/2017| 0:00| 1/1/2017| 0:08| 0.9| 0| 8.3|1/1/2017 0:00|1/1/2017 0:08|
| 1/1/2017| 0:00| 1/1/2017| 0:05| 1.76| 0| 8.3|1/1/2017 0:00|1/1/2017 0:05|
| 1/1/2017| 0:00| 1/1/2017| 0:15| 8.47|7.71|38.55|1/1/2017 0:00|1/1/2017 0:15|
| 1/1/2017| 0:00| 1/1/2017| 0:11| 2.4| 0| 11.8|1/1/2017 0:00|1/1/2017 0:11|
uj5u.com熱心網友回復:
將字串時間戳轉換為時間戳資料型別并減去。
代碼:
import org.apache.spark.sql.functions.{col, to_timestamp}
import org.apache.spark.sql.types.{LongType, TimestampType}
val data = Seq(("1/1/2017 0:00", "1/1/2017 0:35"))
val df = data.toDF("pickup_dt", "drop_dt")
df
.withColumn("pickup_dt", to_timestamp(col("pickup_dt"), "d/M/yyyy H:mm"))
.withColumn("drop_dt", to_timestamp(col("drop_dt"), "d/M/yyyy H:mm"))
.withColumn("diff", (col("drop_dt").cast(LongType) - col("pickup_dt").cast(LongType)) / 60)
.show(false)
輸出:
------------------- ------------------- ----
|pickup_dt |drop_dt |diff|
------------------- ------------------- ----
|2017-01-01 00:00:00|2017-01-01 00:35:00|35.0|
------------------- ------------------- ----
派斯帕克:
from pyspark.sql.functions import col, to_timestamp
df.withColumn(
"diff",
(col("drop_dt").cast("long") - col("pickup_dt").cast("long"))/60.
).show(truncate=False)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/432142.html
