如何使用pyspark將多個行值與groupby相加？-有解無憂

下面給出的是一個 pyspark 資料框，我需要用 groupby 對行值求和

load_dt|org_cntry|sum(srv_curr_vo_qty_accs_mthd)|sum(srv_curr_bb_qty_accs_mthd)|sum(srv_curr_tv_qty_accs_mthd)|
 ------------------- --------- ------------------------------ ------------------------------ ------------------------------ 
|2021-12-06 00:00:00|     null|                           NaN|                           NaN|                           NaN|
|2021-12-06 00:00:00|   PANAMA|                      360126.0|                      214229.0|                      207950.0|

健康）狀況：

1.groupby(load_dt,org_cntry)

2.sum 行值 (sum(srv_curr_vo_qty_accs_mthd)|sum(srv_curr_bb_qty_accs_mthd)|sum(srv_curr_tv_qty_accs_mthd)|

預期產出

load_dt     org_cntry   total_sum
2021-12-06  Panama       782305

uj5u.com熱心網友回復：

簡單地總結（）你的結果：

from pyspark.sql import functions as F

df.groupBy("load_dt", "org_cntry").agg(
    (
        F.sum("srv_curr_vo_qty_accs_mthd")
          F.sum("srv_curr_bb_qty_accs_mthd")
          F.sum("srv_curr_tv_qty_accs_mthd")
    ).alias("total_sum")
)

uj5u.com熱心網友回復：

在這種情況下使用 Spark2.4 高階函式。

Example:

#sample dataframe
# ------------------- --------- -------- -------- -------- 
#|            load_dt|org_cntry|      s1|      s2|      s3|
# ------------------- --------- -------- -------- -------- 
#|2021-12-06 00:00:00|   PANAMA|360126.0|214229.0|207950.0|
# ------------------- --------- -------- -------- -------- 

#create array from sum columns then add all the array elements.
df.selectExpr("*", "AGGREGATE(array(s1,s2,s3), cast(0 as double), (x, y) -> x   y) total_sum").show()

#using withColumn
df.withColumn("total_sum", expr("AGGREGATE(array(s1,s2,s3), cast(0 as double), (x, y) -> x   y)")).show()

# ------------------- --------- -------- -------- -------- --------- 
#|            load_dt|org_cntry|      s1|      s2|      s3|total_sum|
# ------------------- --------- -------- -------- -------- --------- 
#|2021-12-06 00:00:00|   PANAMA|360126.0|214229.0|207950.0| 782305.0|
# ------------------- --------- -------- -------- -------- ---------

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/375526.html

標籤：阿帕奇火花火花 apache-spark-sql

上一篇：資料塊覆寫整個表而不是添加新磁區

下一篇：串列的reduceByKey[Int]