是否可以Datetime從某個資料框中獲取每天的第一天?
架構:
root
|-- Datetime: timestamp (nullable = true)
|-- Quantity: integer (nullable = true)
------------------- --------
| Datetime|Quantity|
------------------- --------
|2021-09-10 10:08:11| 200|
|2021-09-10 10:08:16| 100|
|2021-09-11 10:05:11| 100|
|2021-09-11 10:07:25| 100|
|2021-09-11 10:07:14| 3000|
|2021-09-12 09:24:11| 1000|
------------------- --------
期望的輸出:
------------------- --------
| Datetime|Quantity|
------------------- --------
|2021-09-10 10:08:11| 200|
|2021-09-11 10:05:11| 100|
|2021-09-12 09:24:11| 1000|
------------------- --------
uj5u.com熱心網友回復:
你可以使用row_number它。只需定義一個按天磁區并按以下順序排列的視窗Datetime:
from pyspark.sql import functions as F, Window
w = Window.partitionBy(F.to_date("Datetime")).orderBy("Datetime")
df1 = df.withColumn("rn", F.row_number().over(w)).filter("rn = 1").drop("rn")
df1.show()
# ------------------- --------
#| Datetime|Quantity|
# ------------------- --------
#|2021-09-10 10:08:11| 200|
#|2021-09-11 10:05:11| 100|
#|2021-09-12 09:24:11| 1000|
# ------------------- --------
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/427405.html
標籤:数据框 阿帕奇火花 pyspark apache-spark-sql
