按從date_start到date_end列的日期范圍分組-有解無憂

我有一個具有以下表結構的表：

place_id            date_start       date_end
2826088480774       2017-09-19       2017-09-20
1898375544837       2017-08-01       2017-08-03
1425929142277       2017-09-23       2017-10-03
1013612281863       2016-10-12       2016-10-14
1795296329731       2016-10-13       2016-10-13
695784701956        2017-09-11       2017-11-02

我想計算每個月每個地方有多少事件（每一行都是一個事件）。如果事件日期涉及幾個月，則應計算所有受影響的月份。

place_id 可以重復，所以我做了以下查詢：

Select place_id, EXTRACT(MONTH FROM date_start) as 
month, EXTRACT(YEAR FROM date_start) as year, 
COUNT(*) as events
From Table
Group by place_id, year, month
Order by month, year, events desc

所以我得到以下分組表：

place_id         month      year   events
2826088480774       8       2017     345
1898375544837       8       2017     343
1425929142277       8       2017     344
1013612281863       8       2017     355
1795296329731       8       2017     348
695784701956        8       2017     363

問題是資料僅按分組start_date，我不清楚如何按從date_start到的所有受影響月份分發資料date_end。

uj5u.com熱心網友回復：

您可以使用sequence函式生成日期之間的日期date_start，date_end然后分解生成的陣列列，并按照您已經做過的分組和計數：

df.createOrReplaceTempView("EventsTable")

spark.sql("""
    WITH events AS (
        SELECT  place_id, 
                explode(event_dates) as event_date
        FROM    (
            SELECT  place_id, 
                    sequence(date_start, date_end, interval 1 day) as event_dates
            FROM    EventsTable
        )
    )
    
    SELECT  place_id, 
            month(event_date) as month, 
            year(event_date)  as year,
            count(*)          as events
    FROM    events
    GROUP BY 1, 2, 3
    ORDER BY month, year, events desc
""").show()

// ------------- ----- ---- ------ 
//|     place_id|month|year|events|
// ------------- ----- ---- ------ 
//|1898375544837|    8|2017|     3|
//|695784701956 |    9|2017|    20|
//|1425929142277|    9|2017|     8|
//|2826088480774|    9|2017|     2|
//|1013612281863|   10|2016|     3|
//|1795296329731|   10|2016|     1|
//|695784701956 |   10|2017|    31|
//|1425929142277|   10|2017|     3|
//|695784701956 |   11|2017|     2|
// ------------- ----- ---- ------

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/400124.html

標籤：sql 阿帕奇火花 apache-spark-sql 数据块

上一篇：在執行中使用sparksql錯誤的sql請求

下一篇：帶有空格字符'\xa0'的火花過濾器奇怪的行為