全網最詳細的Hive文章系列,強烈建議收藏加關注!
后面更新文章都會列出歷史文章目錄,幫助大家回顧知識重點,
目錄
系列歷史文章
前言
Hive的開窗函式
一、視窗函式 ROW_NUMBER,RANK,DENSE_RANK
1、資料準備
2、ROW_NUMBER
???????3、RANK 和 DENSE_RANK
???????二、Hive分析視窗函式 SUM,AVG,MIN,MAX
???????1、資料準備
???????2、SUM(結果和ORDER BY相關,默認為升序)
???????3、AVG,MIN,MAX
系列歷史文章
2021年大資料Hive(六):Hive的表生成函式
2021年大資料Hive(五):Hive的內置函式(數學、字串、日期、條件、轉換、行轉列)
2021年大資料Hive(四):Hive查詢語法
2021年大資料Hive(三):手把手教你如何吃透Hive資料庫和表操作(學會秒變數倉大佬)
2021年大資料Hive(二):Hive的三種安裝模式和MySQL搭配使用
2021年大資料Hive(一):Hive基本概念
前言
2021年全網最詳細的大資料筆記,輕松帶你從入門到精通,該欄目每天更新,匯總知識分享

Hive的開窗函式
一、視窗函式 ROW_NUMBER,RANK,DENSE_RANK
???????1、資料準備
cookie1,2018-04-10,1
cookie1,2018-04-11,5
cookie1,2018-04-12,7
cookie1,2018-04-13,3
cookie1,2018-04-14,2
cookie1,2018-04-15,4
cookie1,2018-04-16,4
cookie2,2018-04-10,2
cookie2,2018-04-11,3
cookie2,2018-04-12,5
cookie2,2018-04-13,6
cookie2,2018-04-14,3
cookie2,2018-04-15,9
cookie2,2018-04-16,7
CREATE TABLE itcast_t1 (
cookieid string,
createtime string, --day
pv INT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-- 加載資料:
load data local inpath '/export/data/hivedatas/itcast_t2.txt' into table itcast_t1;
???????2、ROW_NUMBER
ROW_NUMBER() 從1開始,按照順序,生成分組內記錄的序列
SELECT
cookieid,
createtime,
pv,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn
FROM itcast_t1;
???????3、RANK 和 DENSE_RANK
RANK() 生成資料項在分組中的排名,排名相等會在名次中留下空位
DENSE_RANK() 生成資料項在分組中的排名,排名相等會在名次中不會留下空位
SELECT
cookieid,
createtime,
pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3
FROM itcast_t1
WHERE cookieid = 'cookie1';
???????二、Hive分析視窗函式 SUM,AVG,MIN,MAX
???????1、資料準備
--建表陳述句:
create table itcast_t2(
cookieid string,
createtime string, --day
pv int
) row format delimited
fields terminated by ',';
--加載資料:
load data local inpath '/root/hivedata/ itcast_t2.txt' into table itcast_t2;
--開啟智能本地模式
SET hive.exec.mode.local.auto=true;
???????2、SUM(結果和ORDER BY相關,默認為升序)
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime) as pv1
from itcast_t2;
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itcast_t2;
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid) as pv3
from itcast_t2; --如果沒有order by排序陳述句 默認把分組內的所有資料進行sum操作
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as pv4
from itcast_t2;
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5
from itcast_t2;
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as pv6
from itcast_t2;
--pv1: 分組內從起點到當前行的pv累積,如,11號的pv1=10號的pv+11號的pv, 12號=10號+11號+12號
--pv2: 同pv1
--pv3: 分組內(cookie1)所有的pv累加
--pv4: 分組內當前行+往前3行,如,11號=10號+11號, 12號=10號+11號+12號,
13號=10號+11號+12號+13號, 14號=11號+12號+13號+14號
--pv5: 分組內當前行+往前3行+往后1行,如,14號=11號+12號+13號+14號+15號=5+7+3+2+4=21
--pv6: 分組內當前行+往后所有行,如,13號=13號+14號+15號+16號=3+2+4+4=13,
14號=14號+15號+16號=2+4+4=10
/*
- 如果不指定rows between,默認為從起點到當前行;
- 如果不指定order by,則將分組內所有值累加;
- 關鍵是理解rows between含義,也叫做window子句:
- preceding:往前
- following:往后
- current row:當前行
- unbounded:起點
- unbounded preceding 表示從前面的起點
- unbounded following:表示到后面的終點
*/
???????3、AVG,MIN,MAX
AVG,MIN,MAX和SUM用法一樣
select cookieid,createtime,pv,
avg(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itcast_t2;
select cookieid,createtime,pv,
max(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itcast_t2;
select cookieid,createtime,pv,
min(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itcast_t2;
本博客大資料系列文章會一直每天更新,記得收藏加關注喔~
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/287908.html
標籤:其他
