目錄
- 一、數倉分層
- 1.1 為什么要分層
- 1.2 資料集市與資料倉庫概念
- 1.3 數倉命名規范
- 1.3.1 表命名
- 1.3.2 腳本命名
- 1.3.3 表欄位型別
- 二、數倉理論
- 2.1 范式理論
- 2.1.1 范式概念
- 2.1.2 函式依賴
- 2.1.3 三范式區分
- 2.2 關系建模與維度建模
- 2.2.1 關系建模
- 2.2.2 維度建模??
- 2.3 維度表和事實表??
- 2.3.1 維度表
- 2.3.2 事實表
- 2.4 維度模型分類
- 2.5 資料倉庫建模??🌟
- 2.5.1 ODS層
- 2.5.2 DIM層和DWD層
- 2.5.3 DWS層與DWT層
- 2.5.4 ADS層
- 三、數倉環境搭建
- 3.1 Hive環境搭建
- 3.1.1 Hive引擎簡介
- 3.1.2 Hive on Spark配置
- 3.1.3 Hive on Spark測驗
- 3.2 Yarn配置
- 3.2.1 增加ApplicationMaster資源比例
- 3.3 數倉開發環境
- 3.4 資料準備
- 四、數倉搭建-ODS層
- 4.1 ODS層(用戶行為資料)
- 4.1.1 創建日志表ods_log
- 4.1.2 ODS層日志表加載資料腳本
- 4.2 ODS層(業務資料)
- 4.2.1 活動資訊表
- 4.2.2 活動規則表
- 4.2.3 一級品類表
- 4.2.4 二級品類表
- 4.2.5 三級品類表
- 4.2.6 編碼字典表
- 4.2.7 省份表
- 4.2.8 地區表
- 4.2.9 品牌表
- 4.2.10 購物車表
- 4.2.11 評論表
- 4.2.12 優惠券資訊表
- 4.2.13 優惠券領用表
- 4.2.14 收藏表
- 4.2.15 訂單明細表
- 4.2.16 訂單明細活動關聯表
- 4.2.17 訂單明細優惠券關聯表
- 4.2.18 訂單表
- 4.2.19 退單表
- 4.2.20 訂單狀態日志表
- 4.2.21 支付表
- 4.2.22 退款表
- 4.2.23 商品平臺屬性表
- 4.2.24 商品(SKU)表
- 4.2.25 商品銷售屬性表
- 4.2.26 商品(SPU)表
- 4.2.27 用戶表
- 4.2.28 ODS層業務表首日資料裝載腳本
- 4.2.29 ODS層業務表每日資料裝載腳本
- 五、數倉搭建-DIM層
- 5.1 商品維度表(全量)
- 優惠券維度表(全量)
- 5.3 活動維度表(全量)
- 5.4 地區維度表(特殊)
- 5.5 時間維度表(特殊)
- 5.6 用戶維度表(拉鏈表)
- 5.6.1 拉鏈表概述
- 5.6.2 制作拉鏈表
- 5.7 DIM層首日資料裝載腳本
- 5.8 DIM層每日資料裝載腳本
- 六、數倉搭建-DWD層
- 6.1 DWD層 (用戶行為日志)
- 6.1.1 日志決議思路
- 6.1.2 get_json_object函式使用
- 6.1.3 啟動日志表
- 6.1.4 頁面日志表
- 6.1.5 動作日志表
- 6.1.6 曝光日志表
- 6.1.7 錯誤日志表
- 6.1.8 DWD層用戶行為資料加載腳本
- 6.2 DWD層(業務資料)
- 6.2.1 評價事實表(事務型事實表)
- 6.2.2 訂單明細事實表(事務型事實表)
- 6.2.3 退單事實表(事務型事實表)
- 6.2.4 加購事實表(周期型快照事實表,每日快照)
- 6.2.5 收藏事實表(周期型快照事實表,每日快照)
- 6.2.6 優惠券領用事實表(累積型快照事實表)
- 6.2.7 支付事實表(累積型快照事實表)
- 6.2.8 退款事實表(累積型快照事實表)
- 6.2.9 訂單事實表(累積型快照事實表)
- 6.2.10 DWD層業務資料首日裝載腳本
- 6.2.11 DWD層業務資料每日裝載腳本
------------------------------------------分隔符----------------------------------------------------
資料倉庫之電商數倉-- 1、用戶行為資料采集==>
資料倉庫之電商數倉-- 2、業務資料采集平臺==>
一、數倉分層
1.1 為什么要分層
數倉分層

總結??:
??資料倉庫分層:
ODS(Operation Data Store)層: 原始資料層,存放原始資料,直接加載原始日志、資料,資料保持原貌不做處理;
DWD(Data Warehouse Detail)層: 對ODS層資料進行清洗(去除空值、臟資料、超過極限范圍的資料)、脫敏等,保存業務事實明細,一行資訊代表一次業務行為,例如一次下單行為;
DIM(Dimension)層: 維度層,保存維度資料,主要是對業務事實的描述資訊,如何人何處何地等;
DWS(Data Warehouse Service)層: 以DWD層為基礎,按天進行輕度匯總,一行資訊代表一個主題物件一天的匯總行為,如一個用戶一天下單次數;
DWT(Data Warehouse Topic)層: 以DWS層為基礎,對資料進行累積匯總,一行資訊代表一個主題物件的累積行為,例如一個用戶從注冊開始至今下了多少單;
ADS(Application Data Store)層: 為各種統計報表提供資料,
??資料倉庫為什么要分層?
- 把復雜問題簡單化:
將復雜的任務分解成多層來完成?,每一層只處理簡單的任務,方便定位問題; - 減少重復開發:
規范資料分層,通過中間層資料,能夠減少極大地重復計算,增加一次計算結果的復用性; - 隔離原始資料:
不論是資料的例外還是資料的敏感性,使真實資料與統計資料解耦開,
1.2 資料集市與資料倉庫概念
??資料集市與資料倉庫區別:

1.3 數倉命名規范
1.3.1 表命名
ODS層命名為ods_表名
DIM層命名為dim_表名
DWD層命名為dwd_表名
DWS層命名為dws_表名
DWT層命名為dwt_表名
ADS層命名為ads_表名
臨時表命名為tmp_表名
1.3.2 腳本命名
資料源_to_目標_db/log.sh;
用戶行為腳本以log為后綴;業務資料腳本以db為后綴,
1.3.3 表欄位型別
數量型別為bigint;
金額型別為decimal(16, 2); 表示:16位有效數字,其中小數部分2位;
字串(名字,描述資訊等)型別為string;
主鍵外鍵型別為string;
時間戳型別為bigint;
二、數倉理論
2.1 范式理論
2.1.1 范式概念
-
定義
資料建模必須遵循一定的規則,在關系建模中,這種規則就是范式, -
目的
采用范式,可以降低資料的冗余性,
為什么要降低資料冗余性?
1). 十幾年前,磁盤很貴,為了減少磁盤存盤;
2). 以前沒有分布式系統,都是單機,只能增加磁盤,磁盤個數也是有限的;
3). 一次修改,需要修改多個表,很難保證資料一致性,
-
缺點
范式的缺點是獲取資料時,需要通過Join拼接出最后的資料, -
分類
目前業界范式有:第一范式(1NF)、第二范式(2NF)、第三范式(3NF)、巴斯-科德范式(BCNF)、第四范式(4NF)、第五范式(5NF),
2.1.2 函式依賴
函式依賴:完全函式依賴、部分函式依賴、傳遞函式依賴,

2.1.3 三范式區分
-
第一范式1NF核心原則:屬性不可分割;

-
第二范式2NF核心原則: 不能存在“
部分函式依賴”,即不能存在非主鍵欄位部分函式依賴于主鍵函式的現象;

-
第三范式3NF核心原則:
不能存在傳遞函式依賴,即不能存在非主鍵欄位傳遞函式依賴于主鍵欄位的現象,

2.2 關系建模與維度建模
關系建模和維度建模是兩種資料倉庫的建模技術,關系建模由Bill Inmon所倡導,維度建模由Ralph Kimball所倡導,
2.2.1 關系建模
關系建模將復雜的資料抽象為兩個概念——物體和關系,并使用規范化的方式表示出來,關系模型如圖所示,從圖中可以看出,較為松散、零碎,物理表數量多,
圖為關系模型示意圖:

關系模型嚴格遵循第三范式(3NF),資料冗余程度低,資料的一致性容易得到保證,由于資料分布于眾多的表中,查詢會相對復雜,在大資料的場景下,查詢效率相對較低,
2.2.2 維度建模??
維度模型如圖所示,從圖中可以看出,模型相對清晰、簡潔,
圖為維度模型示意圖:

維度模型以資料分析作為出發點,不遵循三范式,故資料存在一定的冗余,維度模型面向業務,將業務用事實表和維度表呈現出來,表結構簡單,故查詢簡單,查詢效率較高,
join少、shuffle少,
2.3 維度表和事實表??
2.3.1 維度表
維度表:一般是對事實的描述資訊,每一張維表對應現實世界中的一個物件或者概念,
🌰用戶、商品、日期、地區等,
維表的特征:
- 維表的范圍很寬(具有多個屬性、列比較多);
- 跟事實表相比,行數相對較小:通常< 10萬條;
- 內容相對固定:編碼表;
🌰:
時間維度表:

2.3.2 事實表
事實表中的每行資料代表一個業務事件(下單、支付、退款、評價等);
“事實”這個術語表示的是業務事件的度量值(可統計次數、個數、金額等),
🌰2020年5月21日,小明花50塊買了顆🍬;
維度表:時間、用戶、商品、商家,事實表:50塊錢、🍬,
每一個事實表的行包括:具有可加性的數值型的度量值、與維表相連接的外鍵,通常具有兩個和兩個以上的外鍵,
事實表的特征:
- 非常大;
- 內容相對的窄:列數較少(主要是外鍵id和度量值);
- 經常發生變化,每天會新增加很多,
1)事務型事實表
以每個事務或事件為單位,例如一個銷售訂單記錄,一筆支付記錄等,作為事實表里的一行資料,一旦事務被提交,事實表資料被插入,資料就不再進行更改,其更新方式為增量更新,
2)周期型快照事實表
周期型快照事實表中不會保留所有資料,只保留固定時間間隔的資料,使用全量同步策略,例如每天或者每月的銷售額,或每月的賬戶余額等,
🌰購物車,有加減商品,隨時都有可能變化,但是我們更關心每天結束時這里面有多少商品,方便我們后期統計分析,
3)累積型快照事實表
累積快照事實表用于跟蹤業務事實的變化,使用新增和變化同步策略,
🌰資料倉庫中可能需要累積或者存盤訂單從下訂單開始,到訂單商品被打包、運輸、和簽收的各個業務階段的時間點資料來跟蹤訂單宣告周期的進展情況,當這個業務程序進行時,事實表的記錄也要不斷更新,

總結:
事務型事實表: 適用于不會發生變化的業務,通常使用增量同步;
周期型事實表:適用于不關心明細操作、只關心結果的業務,通常使用全量同步策略;
累積型事實表:適用于會發生周期性變化的業務,通常使用新增和變化同步策略,
2.4 維度模型分類
在維度建模的基礎上又分為三種模型:星型模型、雪花模型、星座模型,


2.5 資料倉庫建模??🌟
2.5.1 ODS層
-
HDFS用戶行為資料: 通過kafka-flume-kafka資料采集通道傳輸到hdfs的log日志;
-
HDFS業務資料:通sqoop從mysql同步到hdfs的資料檔案;
-
針對HDFS上的用戶行為資料和業務資料,我們如何規劃處理?
1). 保持資料原貌不做任何修改,起到備份資料的作用;
2). 資料采用壓縮,減少磁盤存盤空間(例如:原始資料100G,可以壓縮到10G左右);
3). 創建磁區表,防止后續的全表掃描,
2.5.2 DIM層和DWD層
DIM層DWD層需構建維度模型,一般采用星型模型,呈現的狀態一般為星座模型,
維度建模一般按照以下四個步驟:
選擇業務程序→宣告粒度→確認維度→確認事實
-
選擇業務程序
在業務系統中,挑選我們感興趣的業務線,比如下單業務,支付業務,退款業務,物流業務,一條業務線對應一張事實表, -
宣告粒度
資料粒度指資料倉庫的資料中保存資料的細化程度或綜合程度的級別,
宣告粒度意味著精確定義事實表中的一行資料表示什么,應該盡可能選擇最小粒度,以此來應各種各樣的需求,典型的粒度宣告如下:
訂單事實表中一行資料表示的是一個訂單中的一個商品項;
支付事實表中一行資料表示的是一個支付記錄, -
確定維度
維度的主要作用是描述業務是事實,主要表示的是“誰,何處,何時”等資訊,
確定維度的原則是:后續需求中是否要分析相關維度的指標,
🌰需要統計,什么時間下的訂單多,哪個地區下的訂單多,哪個用戶下的訂單多,需要確定的維度就包括:時間維度、地區維度、用戶維度,
- 確定事實
此處的“事實”一詞,指的是業務中的度量值(次數、個數、件數、金額,可以進行累加),例如訂單金額、下單次數等,
在DWD層,以業務程序為建模驅動,基于每個具體業務程序的特點,構建最細粒度的明細層事實表,事實表可做適當的寬表化處理,
事實表和維度表的關聯比較靈活,但是為了應對更復雜的業務需求,可以將能關聯上的表盡量關聯上,

2.5.3 DWS層與DWT層
DWS層和DWT層統稱寬表層,這兩層的設計思想大致相同,通過以下案例進行闡述,
- 問題引出:兩個需求,統計每個省份訂單的個數、統計每個省份訂單的總金額
- 處理辦法:都是將省份表和訂單表進行join,group by省份,然后計算,同樣資料被計算了兩次,實際上類似的場景還會更多,
那怎么設計能避免重復計算呢?
針對上述場景,可以設計一張地區寬表,其主鍵為地區ID,欄位包含為:下單次數、下單金額、支付次數、支付金額等,上述所有指標都統一進行計算,并將結果保存在該寬表中,這樣就能有效避免資料的重復計算,
- 總結:
1). 需要建哪些寬表:以維度為基準,
2). 寬表里面的欄位:是站在不同維度的角度去看事實表,重點關注事實表聚合后的度量值,
3). DWS和DWT層的區別:DWS層存放的所有主題物件當天的匯總行為;
🌰每個地區當天的下單次數,下單金額等;
DWT層存放的是所有主題物件的累積行為;
🌰每個地區最近7天(15天、30天、60天)的下單次數、下單金額等,
2.5.4 ADS層
對電商系統各大主題指標分別進行分析,
三、數倉環境搭建
3.1 Hive環境搭建
3.1.1 Hive引擎簡介
Hive引擎包括:默認MR、tez、spark;
Hive on Spark: Hive既作為存盤元資料又負責SQL的決議優化,語法是HQL語法,執行引擎變成了Spark,Spark負責采用RDD執行,
Spark on Hive : Hive只作為存盤元資料,Spark負責SQL決議優化,語法是Spark SQL語法,Spark負責采用RDD執行,
對比:
Hive on Spark:周邊生態更完整;
Spark on Hive:計算性能高,
3.1.2 Hive on Spark配置
1. 兼容性說明:
注??: 官網下載的Hive3.1.2和Spark3.0.0默認是不兼容的,因為Hive3.1.2支持的Spark版本是2.4.5,所以需要我們重新編譯Hive3.1.2版本,
編譯步驟: 官網下載Hive3.1.2原始碼,修改pom檔案中參考的Spark版本為3.0.0,如果編譯通過,直接打包獲取jar包,如果報錯,就根據提示,修改相關方法,直到不報錯,打包獲取jar包,
2. 在Hive所在節點部署Spark:
- Spark官網下載jar包地址:
http://spark.apache.org/downloads.html - 上傳
spark-3.0.0-bin-hadoop3.2.tgz至/opt/software/spark目錄下:
[xiaobai@hadoop102 spark]$ ll
total 372316
-rw-r--r--. 1 root root 224453229 Oct 4 17:02 spark-3.0.0-bin-hadoop3.2.tgz
-rw-r--r--. 1 root root 156791324 Oct 4 17:01 spark-3.0.0-bin-without-hadoop.tgz
- 將重新編譯后帶有依賴的spark-3.0.0-bin-hadoop3.2.tgz解壓至/opt/module/目錄下:
[xiaobai@hadoop102 spark]$ tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module/
- 將解壓后的spark-3.0.0-bin-hadoop3.2改成
spark:
[xiaobai@hadoop102 module]$ mv spark-3.0.0-bin-hadoop3.2/ spark
- 配置SPARK_HOME環境變數
/etc/profile.d/my_env.sh:
[xiaobai@hadoop102 module]$ sudo vim /etc/profile.d/my_env.sh
增加以下內容:
# SPARK_HOME
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin

- source環境變數,使其生效:
[xiaobai@hadoop102 module]$ source /etc/profile.d/my_env.sh
3. 在hive中創建spark組態檔:
- 在hive中創建spark組態檔
spark-defaults.conf:
[xiaobai@hadoop102 software]$ vim /opt/module/hive/conf/spark-defaults.conf
- 添加如下內容(在執行任務時,會根據如下引數執行):
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop102:8020/spark-history
spark.executor.memory 1g
spark.driver.memory 1g
- 在HDFS創建
spark-history路徑,用于存盤歷史日志:
[xiaobai@hadoop102 software]$ hadoop fs -mkdir /spark-history

4. 向HDFS上傳Spark純凈版jar包:
- 在hdfs創建
spark-jars路徑:
[xiaobai@hadoop102 spark]$ hadoop fs -mkdir /spark-jars

- 解壓
spark-3.0.0-bin-without-hadoop.tgz至/opt/software/spark路徑:
[xiaobai@hadoop102 spark]$ tar -zxvf spark-3.0.0-bin-without-hadoop.tgz
- 將解壓后的Spark純凈版jar包
spark-3.0.0-bin-without-hadoop上傳到hdfs/spark-jars路徑下:
[xiaobai@hadoop102 spark]$ hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/* /spark-jars

注??:
- 由于Spark3.0.0非純凈版默認支持的是hive2.3.7版本,直接使用會和安裝的Hive3.1.2出現兼容性問題,所以采用Spark純凈版jar包,不包含hadoop和hive相關依賴,避免沖突,
- Hive任務最終由Spark來執行,Spark任務資源分配由Yarn來調度,該任務有可能被分配到集群的任何一個節點,所以需要將Spark的依賴上傳到HDFS集群路徑,這樣集群中任何一個節點都能獲取到,
5. 修改hive-site.xml檔案:
- 在/opt/module/hive/conf路徑下修改
hive-site.xml檔案:
[xiaobai@hadoop102 jars]$ vim /opt/module/hive/conf/hive-site.xml
- 增加以下內容:
<!--Spark依賴位置(注意:埠號8020必須和namenode的埠號一致)-->
<property>
<name>spark.yarn.jars</name>
<value>hdfs://hadoop102:8020/spark-jars/*</value>
</property>
<!--Hive執行引擎-->
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>

3.1.3 Hive on Spark測驗
- 啟動hive客戶端:
[xiaobai@hadoop102 jars]$ hive
- 創建一張測驗表
student:
hive (default)> create table student(id int,name string);
OK
Time taken: 2.96 seconds
- 通過insert測驗效果:
hive (default)> insert into table student values(1001,'Tom');
若出現以下結果,則說明配置成功!

這里要是報30041錯誤,請戳這里==>
3.2 Yarn配置
3.2.1 增加ApplicationMaster資源比例
容量調度器對每個資源佇列中同時運行的Application Master占用的資源進行了限制,該限制通過yarn.scheduler.capacity.maximum-am-resource-percent引數實作,其默認值是0.1,表示每個資源佇列上Application Master最多可使用的資源為該佇列總資源的10%,目的是防止大部分資源都被Application Master占用,而導致Map/Reduce Task無法執行,
生產環境該引數可使用默認值,
因本專案使用的是Linux虛擬機,集群資源很少,為防止同一時刻只能運行一個Job的情況出現,將默認值調大為0.8;
- 在hadoop102的/opt/module/hadoop-3.2.2/etc/hadoop/
capacity-scheduler.xml檔案中修改如下引數值:
[xiaobai@hadoop102 hadoop]$ vim capacity-scheduler.xml
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.8</value>
<description>
2.分發capacity-scheduler.xml組態檔:
[xiaobai@hadoop102 hadoop]$ xsync capacity-scheduler.xml
- 正在運行的任務,在
hadoop103關閉重新啟動yarn集群:
[xiaobai@hadoop103 ~]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
[xiaobai@hadoop103 ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
可在http://hadoop103:8088/cluster/scheduler–>Application Queues–>>Queue:default下查看默認值和修改后的值:
修改前:
修改后:

3.3 數倉開發環境
數倉開發工具可選用DBeaver或DataGrip,
官方鏈接:
https://www.jetbrains.com/datagrip/
https://dbeaver.io/download/
以下為Mac版本下載??

兩者都需要用到JDBC協議連接到Hive,故需要啟動HiveServer2!
- 啟動
HiveServer2
[xiaobai@hadoop102 hive]$ hiveserver2
- 配置DataGrip連接
1). 創建連接

2). 配置連接屬性:可點擊Test Connection進行連接測驗,隨機點擊OK??

- 測驗使用
創建資料庫gmall,并觀察是否創建成功,
1). 創建資料庫

2). 查看資料庫

3). 修改連接,指明連接資料庫
如圖,需在右上角選擇gmall資料庫,為防止遺忘,可修改properties為gmall

點擊properties按鈕,修改連接:

將Schema設定為我們需要的資料庫gmall:

4). 選擇當前資料庫為gmall;
3.4 資料準備
假定數倉上線的日期為2020-06-14.
1. 用戶行為日志
用戶行為日志,一般是沒有歷史資料的,故日志只需要準備2020-06-14一天的資料,
1). 啟動日志采集通道,包括Flume、Kafak等;
2). 修改兩個日志服務器(hadoop102、hadoop103)中的/opt/module/applog/application.yml組態檔,將mock.date引數改為2020-06-14;
[xiaobai@hadoop102 applog]$ vim application.yml

3). 執行日志生成腳本lg.sh;
[xiaobai@hadoop102 applog]$ lg.sh
4). 查看HDFS是否出現相應檔案??

2. 業務資料
業務資料一般存在歷史資料,此處需準備2020-06-10至2020-06-14的資料,具體操作如下,
1). 修改hadoop102節點上的/opt/module/db_log/application.properties檔案,將mock.date、mock.clear,mock.clear.user三個引數:
[xiaobai@hadoop102 db_log]$ vim application.properties
tips:2020-06-10為第一天資料,所以重置需設為1!

2). 執行模擬生成業務資料的命令,生成第一天2020-06-10的歷史資料:
[xiaobai@hadoop102 db_log]$ java -jar gmall2020-mock-db-2021-01-22.jar
3). 設定第二天2020-06-11的資料,修改引數為:

注??:重置引數只有第一天需要設定為1!
4). 執行模擬生成業務資料的命令,生成第二天2020-06-11的歷史資料:
[xiaobai@hadoop102 db_log]$ java -jar gmall2020-mock-db-2021-01-22.jar
以此類推,設定2020-06-10到2020-06-14多天資料并生成!
5). 在/home/xiaobai/bin目錄下執行mysql_to_hdfs_init.sh腳本,將模擬生成的業務資料同步到HDFS,
[xiaobai@hadoop102 bin]$ ./mysql_to_hdfs_init.sh all 2020-06-14

四、數倉搭建-ODS層
- 保持資料原貌不做任何修改,起到備份資料的作用;
- 資料采用LZO壓縮,減少磁盤存盤空間,100G資料可以壓縮到10G以內;
- 創建磁區表,防止后續的全表掃描,在企業開發中大量使用磁區表;
- 創建外部表,在企業開發中,除了自己用的臨時表,創建內部表外,絕大多數場景都是創建外部表,
4.1 ODS層(用戶行為資料)
4.1.1 創建日志表ods_log
- 創建支持lzo壓縮的磁區表
1). 建表陳述句
create database gmall;
--ODS層
--ods日志表
drop table if exists ods_log;
CREATE EXTERNAL TABLE ods_log (`line` string)
PARTITIONED BY (`dt` string) -- 按照時間創建磁區
STORED AS -- 指定存盤方式,讀資料采用LzoTextInputFormat;
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_log' -- 指定資料在hdfs上的存盤位置
;
2). 磁區規劃

2. 加載資料

4.1.2 ODS層日志表加載資料腳本
- 在/home/xiaobai/bin創建一個
hdfs_to_ods_log.sh檔案:
[xiaobai@hadoop102 bin]$ vim hdfs_to_ods_log.sh
在檔案中添加如下內容:
#!/bin/bash
# 定義變數方便修改
APP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
echo ================== 日志日期為 $do_date ==================
sql="
load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log partition(dt='$do_date');
"
hive -e "$sql"
hadoop jar /opt/module/hadoop-3.2.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/$APP/ods/ods_log/dt=$do_date
tips:
[ -n 變數值 ] 判斷變數的值,是否為空;
– 變數的值,非空,回傳true;
– 變數的值,為空,回傳false;
注意:[ -n 變數值 ]不會決議資料,使用[ -n 變數值 ]時,需要對變數加上雙引號(" ");
查看date命令的使用,date --help.
- 增加腳本執行權限:
[xiaobai@hadoop102 bin]$ chmod +x hdfs_to_ods_log.sh
- 腳本使用:
執行腳本:
[xiaobai@hadoop102 bin]$ ./hdfs_to_ods_log.sh 2020-06-14
在dataGrip中查看匯入資料

4.2 ODS層(業務資料)
ODS層業務表磁區規劃如下:

ODS層業務表資料裝載思路如下:

4.2.1 活動資訊表
DROP TABLE IF EXISTS ods_activity_info;
CREATE EXTERNAL TABLE ods_activity_info(
`id` STRING COMMENT '編號',
`activity_name` STRING COMMENT '活動名稱',
`activity_type` STRING COMMENT '活動型別',
`start_time` STRING COMMENT '開始時間',
`end_time` STRING COMMENT '結束時間',
`create_time` STRING COMMENT '創建時間'
) COMMENT '活動資訊表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_info/';
4.2.2 活動規則表
DROP TABLE IF EXISTS ods_activity_rule;
CREATE EXTERNAL TABLE ods_activity_rule(
`id` STRING COMMENT '編號',
`activity_id` STRING COMMENT '活動ID',
`activity_type` STRING COMMENT '活動型別',
`condition_amount` DECIMAL(16,2) COMMENT '滿減金額',
`condition_num` BIGINT COMMENT '滿減件數',
`benefit_amount` DECIMAL(16,2) COMMENT '優惠金額',
`benefit_discount` DECIMAL(16,2) COMMENT '優惠折扣',
`benefit_level` STRING COMMENT '優惠級別'
) COMMENT '活動規則表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_rule/';
4.2.3 一級品類表
DROP TABLE IF EXISTS ods_base_category1;
CREATE EXTERNAL TABLE ods_base_category1(
`id` STRING COMMENT 'id',
`name` STRING COMMENT '名稱'
) COMMENT '商品一級分類表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category1/';
4.2.4 二級品類表
DROP TABLE IF EXISTS ods_base_category2;
CREATE EXTERNAL TABLE ods_base_category2(
`id` STRING COMMENT ' id',
`name` STRING COMMENT '名稱',
`category1_id` STRING COMMENT '一級品類id'
) COMMENT '商品二級分類表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category2/';
4.2.5 三級品類表
DROP TABLE IF EXISTS ods_base_category3;
CREATE EXTERNAL TABLE ods_base_category3(
`id` STRING COMMENT ' id',
`name` STRING COMMENT '名稱',
`category2_id` STRING COMMENT '二級品類id'
) COMMENT '商品三級分類表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category3/';
4.2.6 編碼字典表
DROP TABLE IF EXISTS ods_base_dic;
CREATE EXTERNAL TABLE ods_base_dic(
`dic_code` STRING COMMENT '編號',
`dic_name` STRING COMMENT '編碼名稱',
`parent_code` STRING COMMENT '父編碼',
`create_time` STRING COMMENT '創建日期',
`operate_time` STRING COMMENT '操作日期'
) COMMENT '編碼字典表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_dic/';
4.2.7 省份表
DROP TABLE IF EXISTS ods_base_province;
CREATE EXTERNAL TABLE ods_base_province (
`id` STRING COMMENT '編號',
`name` STRING COMMENT '省份名稱',
`region_id` STRING COMMENT '地區ID',
`area_code` STRING COMMENT '地區編碼',
`iso_code` STRING COMMENT 'ISO-3166編碼,供可視化使用',
`iso_3166_2` STRING COMMENT 'IOS-3166-2編碼,供可視化使用'
) COMMENT '省份表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_province/';
4.2.8 地區表
DROP TABLE IF EXISTS ods_base_region;
CREATE EXTERNAL TABLE ods_base_region (
`id` STRING COMMENT '編號',
`region_name` STRING COMMENT '地區名稱'
) COMMENT '地區表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_region/';
4.2.9 品牌表
DROP TABLE IF EXISTS ods_base_trademark;
CREATE EXTERNAL TABLE ods_base_trademark (
`id` STRING COMMENT '編號',
`tm_name` STRING COMMENT '品牌名稱'
) COMMENT '品牌表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_trademark/';
4.2.10 購物車表
DROP TABLE IF EXISTS ods_cart_info;
CREATE EXTERNAL TABLE ods_cart_info(
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶id',
`sku_id` STRING COMMENT 'skuid',
`cart_price` DECIMAL(16,2) COMMENT '放入購物車時價格',
`sku_num` BIGINT COMMENT '數量',
`sku_name` STRING COMMENT 'sku名稱 (冗余)',
`create_time` STRING COMMENT '創建時間',
`operate_time` STRING COMMENT '修改時間',
`is_ordered` STRING COMMENT '是否已經下單',
`order_time` STRING COMMENT '下單時間',
`source_type` STRING COMMENT '來源型別',
`source_id` STRING COMMENT '來源編號'
) COMMENT '加購表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_cart_info/';
4.2.11 評論表
DROP TABLE IF EXISTS ods_comment_info;
CREATE EXTERNAL TABLE ods_comment_info(
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶ID',
`sku_id` STRING COMMENT '商品sku',
`spu_id` STRING COMMENT '商品spu',
`order_id` STRING COMMENT '訂單ID',
`appraise` STRING COMMENT '評價',
`create_time` STRING COMMENT '評價時間'
) COMMENT '商品評論表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_comment_info/';
4.2.12 優惠券資訊表
DROP TABLE IF EXISTS ods_coupon_info;
CREATE EXTERNAL TABLE ods_coupon_info(
`id` STRING COMMENT '購物券編號',
`coupon_name` STRING COMMENT '購物券名稱',
`coupon_type` STRING COMMENT '購物券型別 1 現金券 2 折扣券 3 滿減券 4 滿件打折券',
`condition_amount` DECIMAL(16,2) COMMENT '滿額數',
`condition_num` BIGINT COMMENT '滿件數',
`activity_id` STRING COMMENT '活動編號',
`benefit_amount` DECIMAL(16,2) COMMENT '減金額',
`benefit_discount` DECIMAL(16,2) COMMENT '折扣',
`create_time` STRING COMMENT '創建時間',
`range_type` STRING COMMENT '范圍型別 1、商品 2、品類 3、品牌',
`limit_num` BIGINT COMMENT '最多領用次數',
`taken_count` BIGINT COMMENT '已領用次數',
`start_time` STRING COMMENT '開始領取時間',
`end_time` STRING COMMENT '結束領取時間',
`operate_time` STRING COMMENT '修改時間',
`expire_time` STRING COMMENT '過期時間'
) COMMENT '優惠券表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_info/';
4.2.13 優惠券領用表
DROP TABLE IF EXISTS ods_coupon_use;
CREATE EXTERNAL TABLE ods_coupon_use(
`id` STRING COMMENT '編號',
`coupon_id` STRING COMMENT '優惠券ID',
`user_id` STRING COMMENT 'skuid',
`order_id` STRING COMMENT 'spuid',
`coupon_status` STRING COMMENT '優惠券狀態',
`get_time` STRING COMMENT '領取時間',
`using_time` STRING COMMENT '使用時間(下單)',
`used_time` STRING COMMENT '使用時間(支付)',
`expire_time` STRING COMMENT '過期時間'
) COMMENT '優惠券領用表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_use/';
4.2.14 收藏表
DROP TABLE IF EXISTS ods_favor_info;
CREATE EXTERNAL TABLE ods_favor_info(
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶id',
`sku_id` STRING COMMENT 'skuid',
`spu_id` STRING COMMENT 'spuid',
`is_cancel` STRING COMMENT '是否取消',
`create_time` STRING COMMENT '收藏時間',
`cancel_time` STRING COMMENT '取消時間'
) COMMENT '商品收藏表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_favor_info/';
4.2.15 訂單明細表
DROP TABLE IF EXISTS ods_order_detail;
CREATE EXTERNAL TABLE ods_order_detail(
`id` STRING COMMENT '編號',
`order_id` STRING COMMENT '訂單號',
`sku_id` STRING COMMENT '商品id',
`sku_name` STRING COMMENT '商品名稱',
`order_price` DECIMAL(16,2) COMMENT '商品價格',
`sku_num` BIGINT COMMENT '商品數量',
`create_time` STRING COMMENT '創建時間',
`source_type` STRING COMMENT '來源型別',
`source_id` STRING COMMENT '來源編號',
`split_final_amount` DECIMAL(16,2) COMMENT '分攤最終金額',
`split_activity_amount` DECIMAL(16,2) COMMENT '分攤活動優惠',
`split_coupon_amount` DECIMAL(16,2) COMMENT '分攤優惠券優惠'
) COMMENT '訂單詳情表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail/';
4.2.16 訂單明細活動關聯表
DROP TABLE IF EXISTS ods_order_detail_activity;
CREATE EXTERNAL TABLE ods_order_detail_activity(
`id` STRING COMMENT '編號',
`order_id` STRING COMMENT '訂單號',
`order_detail_id` STRING COMMENT '訂單明細id',
`activity_id` STRING COMMENT '活動id',
`activity_rule_id` STRING COMMENT '活動規則id',
`sku_id` BIGINT COMMENT '商品id',
`create_time` STRING COMMENT '創建時間'
) COMMENT '訂單詳情活動關聯表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_activity/';
4.2.17 訂單明細優惠券關聯表
DROP TABLE IF EXISTS ods_order_detail_coupon;
CREATE EXTERNAL TABLE ods_order_detail_coupon(
`id` STRING COMMENT '編號',
`order_id` STRING COMMENT '訂單號',
`order_detail_id` STRING COMMENT '訂單明細id',
`coupon_id` STRING COMMENT '優惠券id',
`coupon_use_id` STRING COMMENT '優惠券領用記錄id',
`sku_id` STRING COMMENT '商品id',
`create_time` STRING COMMENT '創建時間'
) COMMENT '訂單詳情活動關聯表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon/';
4.2.18 訂單表
DROP TABLE IF EXISTS ods_order_info;
CREATE EXTERNAL TABLE ods_order_info (
`id` STRING COMMENT '訂單號',
`final_amount` DECIMAL(16,2) COMMENT '訂單最終金額',
`order_status` STRING COMMENT '訂單狀態',
`user_id` STRING COMMENT '用戶id',
`payment_way` STRING COMMENT '支付方式',
`delivery_address` STRING COMMENT '送貨地址',
`out_trade_no` STRING COMMENT '支付流水號',
`create_time` STRING COMMENT '創建時間',
`operate_time` STRING COMMENT '操作時間',
`expire_time` STRING COMMENT '過期時間',
`tracking_no` STRING COMMENT '物流單編號',
`province_id` STRING COMMENT '省份ID',
`activity_reduce_amount` DECIMAL(16,2) COMMENT '活動減免金額',
`coupon_reduce_amount` DECIMAL(16,2) COMMENT '優惠券減免金額',
`original_amount` DECIMAL(16,2) COMMENT '訂單原價金額',
`feight_fee` DECIMAL(16,2) COMMENT '運費',
`feight_fee_reduce` DECIMAL(16,2) COMMENT '運費減免'
) COMMENT '訂單表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_info/';
4.2.19 退單表
DROP TABLE IF EXISTS ods_order_refund_info;
CREATE EXTERNAL TABLE ods_order_refund_info(
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶ID',
`order_id` STRING COMMENT '訂單ID',
`sku_id` STRING COMMENT '商品ID',
`refund_type` STRING COMMENT '退單型別',
`refund_num` BIGINT COMMENT '退單件數',
`refund_amount` DECIMAL(16,2) COMMENT '退單金額',
`refund_reason_type` STRING COMMENT '退單原因型別',
`refund_status` STRING COMMENT '退單狀態',--退單狀態應包含買家申請、賣家審核、賣家識訓、退款完成等狀態,此處未涉及到,故該表按增量處理
`create_time` STRING COMMENT '退單時間'
) COMMENT '退單表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_refund_info/';
4.2.20 訂單狀態日志表
DROP TABLE IF EXISTS ods_order_status_log;
CREATE EXTERNAL TABLE ods_order_status_log (
`id` STRING COMMENT '編號',
`order_id` STRING COMMENT '訂單ID',
`order_status` STRING COMMENT '訂單狀態',
`operate_time` STRING COMMENT '修改時間'
) COMMENT '訂單狀態表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_status_log/';
4.2.21 支付表
DROP TABLE IF EXISTS ods_payment_info;
CREATE EXTERNAL TABLE ods_payment_info(
`id` STRING COMMENT '編號',
`out_trade_no` STRING COMMENT '對外業務編號',
`order_id` STRING COMMENT '訂單編號',
`user_id` STRING COMMENT '用戶編號',
`payment_type` STRING COMMENT '支付型別',
`trade_no` STRING COMMENT '交易編號',
`payment_amount` DECIMAL(16,2) COMMENT '支付金額',
`subject` STRING COMMENT '交易內容',
`payment_status` STRING COMMENT '支付狀態',
`create_time` STRING COMMENT '創建時間',
`callback_time` STRING COMMENT '回呼時間'
) COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_payment_info/';
4.2.22 退款表
DROP TABLE IF EXISTS ods_refund_payment;
CREATE EXTERNAL TABLE ods_refund_payment(
`id` STRING COMMENT '編號',
`out_trade_no` STRING COMMENT '對外業務編號',
`order_id` STRING COMMENT '訂單編號',
`sku_id` STRING COMMENT 'SKU編號',
`payment_type` STRING COMMENT '支付型別',
`trade_no` STRING COMMENT '交易編號',
`refund_amount` DECIMAL(16,2) COMMENT '支付金額',
`subject` STRING COMMENT '交易內容',
`refund_status` STRING COMMENT '支付狀態',
`create_time` STRING COMMENT '創建時間',
`callback_time` STRING COMMENT '回呼時間'
) COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_refund_payment/';
4.2.23 商品平臺屬性表
DROP TABLE IF EXISTS ods_sku_attr_value;
CREATE EXTERNAL TABLE ods_sku_attr_value(
`id` STRING COMMENT '編號',
`attr_id` STRING COMMENT '平臺屬性ID',
`value_id` STRING COMMENT '平臺屬性值ID',
`sku_id` STRING COMMENT '商品ID',
`attr_name` STRING COMMENT '平臺屬性名稱',
`value_name` STRING COMMENT '平臺屬性值名稱'
) COMMENT 'sku平臺屬性表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_attr_value/';
4.2.24 商品(SKU)表
DROP TABLE IF EXISTS ods_sku_info;
CREATE EXTERNAL TABLE ods_sku_info(
`id` STRING COMMENT 'skuId',
`spu_id` STRING COMMENT 'spuid',
`price` DECIMAL(16,2) COMMENT '價格',
`sku_name` STRING COMMENT '商品名稱',
`sku_desc` STRING COMMENT '商品描述',
`weight` DECIMAL(16,2) COMMENT '重量',
`tm_id` STRING COMMENT '品牌id',
`category3_id` STRING COMMENT '品類id',
`is_sale` STRING COMMENT '是否在售',
`create_time` STRING COMMENT '創建時間'
) COMMENT 'SKU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_info/';
4.2.25 商品銷售屬性表
DROP TABLE IF EXISTS ods_sku_sale_attr_value;
CREATE EXTERNAL TABLE ods_sku_sale_attr_value(
`id` STRING COMMENT '編號',
`sku_id` STRING COMMENT 'sku_id',
`spu_id` STRING COMMENT 'spu_id',
`sale_attr_value_id` STRING COMMENT '銷售屬性值id',
`sale_attr_id` STRING COMMENT '銷售屬性id',
`sale_attr_name` STRING COMMENT '銷售屬性名稱',
`sale_attr_value_name` STRING COMMENT '銷售屬性值名稱'
) COMMENT 'sku銷售屬性名稱'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value/';
4.2.26 商品(SPU)表
DROP TABLE IF EXISTS ods_spu_info;
CREATE EXTERNAL TABLE ods_spu_info(
`id` STRING COMMENT 'spuid',
`spu_name` STRING COMMENT 'spu名稱',
`category3_id` STRING COMMENT '品類id',
`tm_id` STRING COMMENT '品牌id'
) COMMENT 'SPU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_spu_info/';
4.2.27 用戶表
DROP TABLE IF EXISTS ods_user_info;
CREATE EXTERNAL TABLE ods_user_info(
`id` STRING COMMENT '用戶id',
`login_name` STRING COMMENT '用戶名稱',
`nick_name` STRING COMMENT '用戶昵稱',
`name` STRING COMMENT '用戶姓名',
`phone_num` STRING COMMENT '手機號碼',
`email` STRING COMMENT '郵箱',
`user_level` STRING COMMENT '用戶等級',
`birthday` STRING COMMENT '生日',
`gender` STRING COMMENT '性別',
`create_time` STRING COMMENT '創建時間',
`operate_time` STRING COMMENT '操作時間'
) COMMENT '用戶表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_user_info/';
4.2.28 ODS層業務表首日資料裝載腳本
- 撰寫腳本: 在/home/xiaobai/bin目錄下創建腳本
hdfs_to_ods_db_init.sh:
[xiaobai@hadoop102 bin]$ vim hdfs_to_ods_db_init.sh
在腳本中填寫如下內容:
#!/bin/bash
APP=gmall
if [ -n "$2" ] ;then
do_date=$2
else
echo "請傳入日期引數"
exit
fi
ods_order_info="
load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"
ods_order_detail="
load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"
ods_sku_info="
load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"
ods_user_info="
load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"
ods_payment_info="
load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"
ods_base_category1="
load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"
ods_base_category2="
load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"
ods_base_category3="
load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "
ods_base_trademark="
load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "
ods_activity_info="
load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "
ods_cart_info="
load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "
ods_comment_info="
load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "
ods_coupon_info="
load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "
ods_coupon_use="
load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "
ods_favor_info="
load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "
ods_order_refund_info="
load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "
ods_order_status_log="
load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "
ods_spu_info="
load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "
ods_activity_rule="
load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');"
ods_base_dic="
load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "
ods_order_detail_activity="
load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "
ods_order_detail_coupon="
load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "
ods_refund_payment="
load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "
ods_sku_attr_value="
load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "
ods_sku_sale_attr_value="
load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "
ods_base_province="
load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"
ods_base_region="
load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"
case $1 in
"ods_order_info"){
hive -e "$ods_order_info"
};;
"ods_order_detail"){
hive -e "$ods_order_detail"
};;
"ods_sku_info"){
hive -e "$ods_sku_info"
};;
"ods_user_info"){
hive -e "$ods_user_info"
};;
"ods_payment_info"){
hive -e "$ods_payment_info"
};;
"ods_base_category1"){
hive -e "$ods_base_category1"
};;
"ods_base_category2"){
hive -e "$ods_base_category2"
};;
"ods_base_category3"){
hive -e "$ods_base_category3"
};;
"ods_base_trademark"){
hive -e "$ods_base_trademark"
};;
"ods_activity_info"){
hive -e "$ods_activity_info"
};;
"ods_cart_info"){
hive -e "$ods_cart_info"
};;
"ods_comment_info"){
hive -e "$ods_comment_info"
};;
"ods_coupon_info"){
hive -e "$ods_coupon_info"
};;
"ods_coupon_use"){
hive -e "$ods_coupon_use"
};;
"ods_favor_info"){
hive -e "$ods_favor_info"
};;
"ods_order_refund_info"){
hive -e "$ods_order_refund_info"
};;
"ods_order_status_log"){
hive -e "$ods_order_status_log"
};;
"ods_spu_info"){
hive -e "$ods_spu_info"
};;
"ods_activity_rule"){
hive -e "$ods_activity_rule"
};;
"ods_base_dic"){
hive -e "$ods_base_dic"
};;
"ods_order_detail_activity"){
hive -e "$ods_order_detail_activity"
};;
"ods_order_detail_coupon"){
hive -e "$ods_order_detail_coupon"
};;
"ods_refund_payment"){
hive -e "$ods_refund_payment"
};;
"ods_sku_attr_value"){
hive -e "$ods_sku_attr_value"
};;
"ods_sku_sale_attr_value"){
hive -e "$ods_sku_sale_attr_value"
};;
"ods_base_province"){
hive -e "$ods_base_province"
};;
"ods_base_region"){
hive -e "$ods_base_region"
};;
"all"){
hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value$ods_base_province$ods_base_region"
};;
esac
- 增加執行權限:
[xiaobai@hadoop102 bin]$ chmod +x hdfs_to_ods_db_init.sh
- 執行腳本
[xiaobai@hadoop102 bin]$ ./hdfs_to_ods_db_init.sh all 2020-06-14
在dataGrip中查看資料是否匯入成功,以下成功!

4.2.29 ODS層業務表每日資料裝載腳本
- 創建腳本: 在/home/xiaobai/bin目錄下創建腳本
hdfs_to_ods_db.sh:
[xiaobai@hadoop102 bin]$ vim hdfs_to_ods_db.sh
在腳本中填寫如下內容:
#!/bin/bash
APP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;then
do_date=$2
else
do_date=`date -d "-1 day" +%F`
fi
ods_order_info="
load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"
ods_order_detail="
load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"
ods_sku_info="
load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"
ods_user_info="
load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"
ods_payment_info="
load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"
ods_base_category1="
load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"
ods_base_category2="
load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"
ods_base_category3="
load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "
ods_base_trademark="
load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "
ods_activity_info="
load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "
ods_cart_info="
load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "
ods_comment_info="
load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "
ods_coupon_info="
load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "
ods_coupon_use="
load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "
ods_favor_info="
load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "
ods_order_refund_info="
load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "
ods_order_status_log="
load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "
ods_spu_info="
load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "
ods_activity_rule="
load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');"
ods_base_dic="
load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "
ods_order_detail_activity="
load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "
ods_order_detail_coupon="
load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "
ods_refund_payment="
load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "
ods_sku_attr_value="
load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "
ods_sku_sale_attr_value="
load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "
ods_base_province="
load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"
ods_base_region="
load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"
case $1 in
"ods_order_info"){
hive -e "$ods_order_info"
};;
"ods_order_detail"){
hive -e "$ods_order_detail"
};;
"ods_sku_info"){
hive -e "$ods_sku_info"
};;
"ods_user_info"){
hive -e "$ods_user_info"
};;
"ods_payment_info"){
hive -e "$ods_payment_info"
};;
"ods_base_category1"){
hive -e "$ods_base_category1"
};;
"ods_base_category2"){
hive -e "$ods_base_category2"
};;
"ods_base_category3"){
hive -e "$ods_base_category3"
};;
"ods_base_trademark"){
hive -e "$ods_base_trademark"
};;
"ods_activity_info"){
hive -e "$ods_activity_info"
};;
"ods_cart_info"){
hive -e "$ods_cart_info"
};;
"ods_comment_info"){
hive -e "$ods_comment_info"
};;
"ods_coupon_info"){
hive -e "$ods_coupon_info"
};;
"ods_coupon_use"){
hive -e "$ods_coupon_use"
};;
"ods_favor_info"){
hive -e "$ods_favor_info"
};;
"ods_order_refund_info"){
hive -e "$ods_order_refund_info"
};;
"ods_order_status_log"){
hive -e "$ods_order_status_log"
};;
"ods_spu_info"){
hive -e "$ods_spu_info"
};;
"ods_activity_rule"){
hive -e "$ods_activity_rule"
};;
"ods_base_dic"){
hive -e "$ods_base_dic"
};;
"ods_order_detail_activity"){
hive -e "$ods_order_detail_activity"
};;
"ods_order_detail_coupon"){
hive -e "$ods_order_detail_coupon"
};;
"ods_refund_payment"){
hive -e "$ods_refund_payment"
};;
"ods_sku_attr_value"){
hive -e "$ods_sku_attr_value"
};;
"ods_sku_sale_attr_value"){
hive -e "$ods_sku_sale_attr_value"
};;
"all"){
hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value"
};;
esac
- 修改權限:
[xiaobai@hadoop102 bin]$ chmod +x hdfs_to_ods_db.sh
- 執行腳本:
在2020-06-15資料更新執行!
hdfs_to_ods_db.sh all 2020-06-14
五、數倉搭建-DIM層
5.1 商品維度表(全量)
tips:
商品維度表每日采用全量同步,故首日裝載陳述句與每日裝載陳述句除了日期之外,都相同!
- 建表陳述句
DROP TABLE IF EXISTS dim_sku_info;
CREATE EXTERNAL TABLE dim_sku_info (
`id` STRING COMMENT '商品id',
`price` DECIMAL(16,2) COMMENT '商品價格',
`sku_name` STRING COMMENT '商品名稱',
`sku_desc` STRING COMMENT '商品描述',
`weight` DECIMAL(16,2) COMMENT '重量',
`is_sale` BOOLEAN COMMENT '是否在售',
`spu_id` STRING COMMENT 'spu編號',
`spu_name` STRING COMMENT 'spu名稱',
`category3_id` STRING COMMENT '三級分類id',
`category3_name` STRING COMMENT '三級分類名稱',
`category2_id` STRING COMMENT '二級分類id',
`category2_name` STRING COMMENT '二級分類名稱',
`category1_id` STRING COMMENT '一級分類id',
`category1_name` STRING COMMENT '一級分類名稱',
`tm_id` STRING COMMENT '品牌id',
`tm_name` STRING COMMENT '品牌名稱',
`sku_attr_values` ARRAY<STRUCT<attr_id:STRING,value_id:STRING,attr_name:STRING,value_name:STRING>> COMMENT '平臺屬性',
`sku_sale_attr_values` ARRAY<STRUCT<sale_attr_id:STRING,sale_attr_value_id:STRING,sale_attr_name:STRING,sale_attr_value_name:STRING>> COMMENT '銷售屬性',
`create_time` STRING COMMENT '創建時間'
) COMMENT '商品維度表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_sku_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 磁區規劃
商品維度表磁區:

- 資料裝載

1). 首日裝載
with
sku as
(
select
id,
price,
sku_name,
sku_desc,
weight,
is_sale,
spu_id,
category3_id,
tm_id,
create_time
from ods_sku_info
where dt='2020-06-14'
),
spu as
(
select
id,
spu_name
from ods_spu_info
where dt='2020-06-14'
),
c3 as
(
select
id,
name,
category2_id
from ods_base_category3
where dt='2020-06-14'
),
c2 as
(
select
id,
name,
category1_id
from ods_base_category2
where dt='2020-06-14'
),
c1 as
(
select
id,
name
from ods_base_category1
where dt='2020-06-14'
),
tm as
(
select
id,
tm_name
from ods_base_trademark
where dt='2020-06-14'
),
attr as
(
select
sku_id,
collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
from ods_sku_attr_value
where dt='2020-06-14'
group by sku_id
),
sale_attr as
(
select
sku_id,
collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
from ods_sku_sale_attr_value
where dt='2020-06-14'
group by sku_id
)
insert overwrite table dim_sku_info partition(dt='2020-06-14')
select
sku.id,
sku.price,
sku.sku_name,
sku.sku_desc,
sku.weight,
sku.is_sale,
sku.spu_id,
spu.spu_name,
sku.category3_id,
c3.name,
c3.category2_id,
c2.name,
c2.category1_id,
c1.name,
sku.tm_id,
tm.tm_name,
attr.attrs,
sale_attr.sale_attrs,
sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
如圖,執行首日裝載陳述句,資料正常匯入!
這里報org.apache.hadoop.hive.ql.parse.SemanticException:Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 65727339-603a-4fca-9df2-2f9d30e4b4a5這個錯誤的戳這里==>

but dim_sku_info表的最后一行標紅部分為例外資料,是因為hive默認情況下對map端的小檔案進行合并導致,而insert陳述句會被決議成一個計算任務來讀取與ods層相關的業務資料,這些資料被壓縮存盤為.lzo 與其對應的索引檔案.index,此2個檔案都較小,所以hive從表中讀取資料時會誤將此2個檔案當作普通小檔案進行合并,這會導致lzo檔案無法切片,
如圖,hive.input.format默認值為hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;故我們需關閉此小檔案合并功能,

?解決方法:
修改CombineHiveInputFormat為HiveInputFormat
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

重新執行首日裝載陳述句,例外資料消失!
注??: 如果當我們讀取的表是lzo檔案型別且為它創建了索引,此時我們就要關閉小檔案合并功能!
2). 每日裝載
with
sku as
(
select
id,
price,
sku_name,
sku_desc,
weight,
is_sale,
spu_id,
category3_id,
tm_id,
create_time
from ods_sku_info
where dt='2020-06-15'
),
spu as
(
select
id,
spu_name
from ods_spu_info
where dt='2020-06-15'
),
c3 as
(
select
id,
name,
category2_id
from ods_base_category3
where dt='2020-06-15'
),
c2 as
(
select
id,
name,
category1_id
from ods_base_category2
where dt='2020-06-15'
),
c1 as
(
select
id,
name
from ods_base_category1
where dt='2020-06-15'
),
tm as
(
select
id,
tm_name
from ods_base_trademark
where dt='2020-06-15'
),
attr as
(
select
sku_id,
collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
from ods_sku_attr_value
where dt='2020-06-15'
group by sku_id
),
sale_attr as
(
select
sku_id,
collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
from ods_sku_sale_attr_value
where dt='2020-06-15'
group by sku_id
)
insert overwrite table dim_sku_info partition(dt='2020-06-15')
select
sku.id,
sku.price,
sku.sku_name,
sku.sku_desc,
sku.weight,
sku.is_sale,
sku.spu_id,
spu.spu_name,
sku.category3_id,
c3.name,
c3.category2_id,
c2.name,
c2.category1_id,
c1.name,
sku.tm_id,
tm.tm_name,
attr.attrs,
sale_attr.sale_attrs,
sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
優惠券維度表(全量)
tips:
優惠券維度表每日采用全量同步,故首日裝載陳述句與每日裝載陳述句除了日期之外,都相同!
- 建表陳述句
DROP TABLE IF EXISTS dim_coupon_info;
CREATE EXTERNAL TABLE dim_coupon_info(
`id` STRING COMMENT '購物券編號',
`coupon_name` STRING COMMENT '購物券名稱',
`coupon_type` STRING COMMENT '購物券型別 1 現金券 2 折扣券 3 滿減券 4 滿件打折券',
`condition_amount` DECIMAL(16,2) COMMENT '滿額數',
`condition_num` BIGINT COMMENT '滿件數',
`activity_id` STRING COMMENT '活動編號',
`benefit_amount` DECIMAL(16,2) COMMENT '減金額',
`benefit_discount` DECIMAL(16,2) COMMENT '折扣',
`create_time` STRING COMMENT '創建時間',
`range_type` STRING COMMENT '范圍型別 1、商品 2、品類 3、品牌',
`limit_num` BIGINT COMMENT '最多領取次數',
`taken_count` BIGINT COMMENT '已領取次數',
`start_time` STRING COMMENT '可以領取的開始日期',
`end_time` STRING COMMENT '可以領取的結束日期',
`operate_time` STRING COMMENT '修改時間',
`expire_time` STRING COMMENT '過期時間'
) COMMENT '優惠券維度表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_coupon_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 磁區規劃

- 資料裝載

1). 首日裝載
insert overwrite table dim_coupon_info partition(dt='2020-06-14')
select
id,
coupon_name,
coupon_type,
condition_amount,
condition_num,
activity_id,
benefit_amount,
benefit_discount,
create_time,
range_type,
limit_num,
taken_count,
start_time,
end_time,
operate_time,
expire_time
from ods_coupon_info
where dt='2020-06-14';
2). 每日裝載
insert overwrite table dim_coupon_info partition(dt='2020-06-15')
select
id,
coupon_name,
coupon_type,
condition_amount,
condition_num,
activity_id,
benefit_amount,
benefit_discount,
create_time,
range_type,
limit_num,
taken_count,
start_time,
end_time,
operate_time,
expire_time
from ods_coupon_info
where dt='2020-06-15';
5.3 活動維度表(全量)
tip:每行資料為一個活動規則,而非一個活動!
- 建表陳述句
DROP TABLE IF EXISTS dim_activity_rule_info;
CREATE EXTERNAL TABLE dim_activity_rule_info(
`activity_rule_id` STRING COMMENT '活動規則ID',
`activity_id` STRING COMMENT '活動ID',
`activity_name` STRING COMMENT '活動名稱',
`activity_type` STRING COMMENT '活動型別',
`start_time` STRING COMMENT '開始時間',
`end_time` STRING COMMENT '結束時間',
`create_time` STRING COMMENT '創建時間',
`condition_amount` DECIMAL(16,2) COMMENT '滿減金額',
`condition_num` BIGINT COMMENT '滿減件數',
`benefit_amount` DECIMAL(16,2) COMMENT '優惠金額',
`benefit_discount` DECIMAL(16,2) COMMENT '優惠折扣',
`benefit_level` STRING COMMENT '優惠級別'
) COMMENT '活動資訊表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_activity_rule_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
-
磁區規劃

-
資料裝載

1). 首日裝載
insert overwrite table dim_activity_rule_info partition(dt='2020-06-14')
select
ar.id,
ar.activity_id,
ai.activity_name,
ar.activity_type,
ai.start_time,
ai.end_time,
ai.create_time,
ar.condition_amount,
ar.condition_num,
ar.benefit_amount,
ar.benefit_discount,
ar.benefit_level
from
(
select
id,
activity_id,
activity_type,
condition_amount,
condition_num,
benefit_amount,
benefit_discount,
benefit_level
from ods_activity_rule
where dt='2020-06-14'
)ar
left join
(
select
id,
activity_name,
start_time,
end_time,
create_time
from ods_activity_info
where dt='2020-06-14'
)ai
on ar.activity_id=ai.id;
2). 每日轉載
insert overwrite table dim_activity_rule_info partition(dt='2020-06-15')
select
ar.id,
ar.activity_id,
ai.activity_name,
ar.activity_type,
ai.start_time,
ai.end_time,
ai.create_time,
ar.condition_amount,
ar.condition_num,
ar.benefit_amount,
ar.benefit_discount,
ar.benefit_level
from
(
select
id,
activity_id,
activity_type,
condition_amount,
condition_num,
benefit_amount,
benefit_discount,
benefit_level
from ods_activity_rule
where dt='2020-06-15'
)ar
left join
(
select
id,
activity_name,
start_time,
end_time,
create_time
from ods_activity_info
where dt='2020-06-15'
)ai
on ar.activity_id=ai.id;
5.4 地區維度表(特殊)
tips: 地區維度表資料相對穩定,變化概率較低,故無需每日裝載,
- 建表陳述句
DROP TABLE IF EXISTS dim_base_province;
CREATE EXTERNAL TABLE dim_base_province (
`id` STRING COMMENT 'id',
`province_name` STRING COMMENT '省市名稱',
`area_code` STRING COMMENT '地區編碼',
`iso_code` STRING COMMENT 'ISO-3166編碼,供可視化使用',
`iso_3166_2` STRING COMMENT 'IOS-3166-2編碼,供可視化使用',
`region_id` STRING COMMENT '地區id',
`region_name` STRING COMMENT '地區名稱'
) COMMENT '地區維度表'
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_base_province/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 資料裝載

裝載陳述句
insert overwrite table dim_base_province
select
bp.id,
bp.name,
bp.area_code,
bp.iso_code,
bp.iso_3166_2,
bp.region_id,
br.region_name
from ods_base_province bp
join ods_base_region br on bp.region_id = br.id;
5.5 時間維度表(特殊)
通常情況下,時間維度表的資料并不是來自于業務系統,而是手動寫入,并且由于時間維度表資料的可預見性,無須每榷訓入,一般可一次性匯入一年的資料,
- 建表陳述句
DROP TABLE IF EXISTS dim_date_info;
CREATE EXTERNAL TABLE dim_date_info(
`date_id` STRING COMMENT '日',
`week_id` STRING COMMENT '周ID',
`week_day` STRING COMMENT '周幾',
`day` STRING COMMENT '每月的第幾天',
`month` STRING COMMENT '第幾月',
`quarter` STRING COMMENT '第幾季度',
`year` STRING COMMENT '年',
`is_workday` STRING COMMENT '是否是作業日',
`holiday_id` STRING COMMENT '節假日'
) COMMENT '時間維度表'
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_date_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 資料裝載
因節假日較特殊,可以請求網路上的日歷介面,也可將其專門處理,之后將日期屬性寫入檔案date_info.txt,將檔案load到時間維度表dim_date_info中;
因dim_date_info是PARQUET列式存盤+lzo壓縮格式,不能識別date_info.txt文本檔案,故不可直接匯入!可先創建一個不采用PARQUET列式存盤+lzo壓縮格式的臨時表tmp_dim_date_info load date_info.txt,再將資料insert到date_info.txt中,
1). 創建臨時表
DROP TABLE IF EXISTS tmp_dim_date_info;
CREATE EXTERNAL TABLE tmp_dim_date_info (
`date_id` STRING COMMENT '日',
`week_id` STRING COMMENT '周ID',
`week_day` STRING COMMENT '周幾',
`day` STRING COMMENT '每月的第幾天',
`month` STRING COMMENT '第幾月',
`quarter` STRING COMMENT '第幾季度',
`year` STRING COMMENT '年',
`is_workday` STRING COMMENT '是否是作業日',
`holiday_id` STRING COMMENT '節假日'
) COMMENT '時間維度表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/tmp/tmp_dim_date_info/';
2). 將資料檔案上傳到HFDS上臨時表指定路徑/warehouse/gmall/tmp/tmp_dim_date_info/

可以看到臨時表tmp_dim_date_info 表已經生成資料!

3). 執行以下陳述句將其匯入時間維度表
insert overwrite table dim_date_info select * from tmp_dim_date_info;
可以看到時間維度表dim_date_info已經成功匯入資料!

4). 檢查資料是否匯入成功
select * from dim_date_info;
5.6 用戶維度表(拉鏈表)
拉鏈表首日裝載,需要進行初始化操作,具體作業為將截止到初始化當日的全部歷史用戶匯入一次性匯入到拉鏈表中,目前的ods_user_info表的第一個磁區,即2020-06-14磁區中就是全部的歷史用戶,故將該磁區資料進行一定處理后匯入拉鏈表的9999-99-99磁區即可,
5.6.1 拉鏈表概述
- 什么是拉鏈表?

-
為什么要做拉鏈表?

-
如何使用拉鏈表?

-
拉鏈表形成程序

5.6.2 制作拉鏈表
- 建表陳述句
DROP TABLE IF EXISTS dim_user_info;
CREATE EXTERNAL TABLE dim_user_info(
`id` STRING COMMENT '用戶id',
`login_name` STRING COMMENT '用戶名稱',
`nick_name` STRING COMMENT '用戶昵稱',
`name` STRING COMMENT '用戶姓名',
`phone_num` STRING COMMENT '手機號碼',
`email` STRING COMMENT '郵箱',
`user_level` STRING COMMENT '用戶等級',
`birthday` STRING COMMENT '生日',
`gender` STRING COMMENT '性別',
`create_time` STRING COMMENT '創建時間',
`operate_time` STRING COMMENT '操作時間',
`start_date` STRING COMMENT '開始日期',
`end_date` STRING COMMENT '結束日期'
) COMMENT '用戶表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_user_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 磁區規劃

- 資料裝載

1). 首日裝載
insert overwrite table dim_user_info partition(dt='9999-99-99')
select
id,
login_name,
nick_name,
md5(name),
md5(phone_num),
md5(email),
user_level,
birthday,
gender,
create_time,
operate_time,
'2020-06-14',
'9999-99-99'
from ods_user_info
where dt='2020-06-14';
2). 每日裝載
實作思路

sql撰寫
with
tmp as
(
select
old.id old_id,
old.login_name old_login_name,
old.nick_name old_nick_name,
old.name old_name,
old.phone_num old_phone_num,
old.email old_email,
old.user_level old_user_level,
old.birthday old_birthday,
old.gender old_gender,
old.create_time old_create_time,
old.operate_time old_operate_time,
old.start_date old_start_date,
old.end_date old_end_date,
new.id new_id,
new.login_name new_login_name,
new.nick_name new_nick_name,
new.name new_name,
new.phone_num new_phone_num,
new.email new_email,
new.user_level new_user_level,
new.birthday new_birthday,
new.gender new_gender,
new.create_time new_create_time,
new.operate_time new_operate_time,
new.start_date new_start_date,
new.end_date new_end_date
from
(
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date
from dim_user_info
where dt='9999-99-99'
)old
full outer join
(
select
id,
login_name,
nick_name,
md5(name) name,
md5(phone_num) phone_num,
md5(email) email,
user_level,
birthday,
gender,
create_time,
operate_time,
'2020-06-15' start_date,
'9999-99-99' end_date
from ods_user_info
where dt='2020-06-15'
)new
on old.id=new.id
)
insert overwrite table dim_user_info partition(dt)
select
nvl(new_id,old_id),
nvl(new_login_name,old_login_name),
nvl(new_nick_name,old_nick_name),
nvl(new_name,old_name),
nvl(new_phone_num,old_phone_num),
nvl(new_email,old_email),
nvl(new_user_level,old_user_level),
nvl(new_birthday,old_birthday),
nvl(new_gender,old_gender),
nvl(new_create_time,old_create_time),
nvl(new_operate_time,old_operate_time),
nvl(new_start_date,old_start_date),
nvl(new_end_date,old_end_date),
nvl(new_end_date,old_end_date) dt
from tmp
union all
select
old_id,
old_login_name,
old_nick_name,
old_name,
old_phone_num,
old_email,
old_user_level,
old_birthday,
old_gender,
old_create_time,
old_operate_time,
old_start_date,
cast(date_add('2020-06-15',-1) as string),
cast(date_add('2020-06-15',-1) as string) dt
from tmp
where new_id is not null and old_id is not null;
5.7 DIM層首日資料裝載腳本
- 撰寫腳本
1). 在/home/xiaobai/bin目錄下創建腳本ods_to_dim_db_init.sh
[xiaobai@hadoop102 bin]$ vim ods_to_dim_db_init.sh
在腳本中填寫如下內容:
#!/bin/bash
APP=gmall
if [ -n "$2" ] ;then
do_date=$2
else
echo "請傳入日期引數"
exit
fi
dim_user_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_user_info partition(dt='9999-99-99')
select
id,
login_name,
nick_name,
md5(name),
md5(phone_num),
md5(email),
user_level,
birthday,
gender,
create_time,
operate_time,
'$do_date',
'9999-99-99'
from ${APP}.ods_user_info
where dt='$do_date';
"
dim_sku_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
sku as
(
select
id,
price,
sku_name,
sku_desc,
weight,
is_sale,
spu_id,
category3_id,
tm_id,
create_time
from ${APP}.ods_sku_info
where dt='$do_date'
),
spu as
(
select
id,
spu_name
from ${APP}.ods_spu_info
where dt='$do_date'
),
c3 as
(
select
id,
name,
category2_id
from ${APP}.ods_base_category3
where dt='$do_date'
),
c2 as
(
select
id,
name,
category1_id
from ${APP}.ods_base_category2
where dt='$do_date'
),
c1 as
(
select
id,
name
from ${APP}.ods_base_category1
where dt='$do_date'
),
tm as
(
select
id,
tm_name
from ${APP}.ods_base_trademark
where dt='$do_date'
),
attr as
(
select
sku_id,
collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
from ${APP}.ods_sku_attr_value
where dt='$do_date'
group by sku_id
),
sale_attr as
(
select
sku_id,
collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
from ${APP}.ods_sku_sale_attr_value
where dt='$do_date'
group by sku_id
)
insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date')
select
sku.id,
sku.price,
sku.sku_name,
sku.sku_desc,
sku.weight,
sku.is_sale,
sku.spu_id,
spu.spu_name,
sku.category3_id,
c3.name,
c3.category2_id,
c2.name,
c2.category1_id,
c1.name,
sku.tm_id,
tm.tm_name,
attr.attrs,
sale_attr.sale_attrs,
sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"
dim_base_province="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_base_province
select
bp.id,
bp.name,
bp.area_code,
bp.iso_code,
bp.iso_3166_2,
bp.region_id,
br.region_name
from ${APP}.ods_base_province bp
join ${APP}.ods_base_region br on bp.region_id = br.id;
"
dim_coupon_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date')
select
id,
coupon_name,
coupon_type,
condition_amount,
condition_num,
activity_id,
benefit_amount,
benefit_discount,
create_time,
range_type,
limit_num,
taken_count,
start_time,
end_time,
operate_time,
expire_time
from ${APP}.ods_coupon_info
where dt='$do_date';
"
dim_activity_rule_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date')
select
ar.id,
ar.activity_id,
ai.activity_name,
ar.activity_type,
ai.start_time,
ai.end_time,
ai.create_time,
ar.condition_amount,
ar.condition_num,
ar.benefit_amount,
ar.benefit_discount,
ar.benefit_level
from
(
select
id,
activity_id,
activity_type,
condition_amount,
condition_num,
benefit_amount,
benefit_discount,
benefit_level
from ${APP}.ods_activity_rule
where dt='$do_date'
)ar
left join
(
select
id,
activity_name,
start_time,
end_time,
create_time
from ${APP}.ods_activity_info
where dt='$do_date'
)ai
on ar.activity_id=ai.id;
"
case $1 in
"dim_user_info"){
hive -e "$dim_user_info"
};;
"dim_sku_info"){
hive -e "$dim_sku_info"
};;
"dim_base_province"){
hive -e "$dim_base_province"
};;
"dim_coupon_info"){
hive -e "$dim_coupon_info"
};;
"dim_activity_rule_info"){
hive -e "$dim_activity_rule_info"
};;
"all"){
hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info$dim_base_province"
};;
esac
2). 增加執行權限:見每日裝載腳本!
3). 執行腳本
[xiaobai@hadoop102 bin]$ ./ods_to_dim_db_init.sh all 2020-06-14
4). 查看資料是否匯入成功

5.8 DIM層每日資料裝載腳本
- 撰寫腳本
1). 在/home/xiaobai/bin目錄下創建腳本ods_to_dim_db.sh
[xiaobai@hadoop102 bin]$ vim ods_to_dim_db.sh
2). 在腳本中填寫如下內容:
#!/bin/bash
APP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;then
do_date=$2
else
do_date=`date -d "-1 day" +%F`
fi
dim_user_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
tmp as
(
select
old.id old_id,
old.login_name old_login_name,
old.nick_name old_nick_name,
old.name old_name,
old.phone_num old_phone_num,
old.email old_email,
old.user_level old_user_level,
old.birthday old_birthday,
old.gender old_gender,
old.create_time old_create_time,
old.operate_time old_operate_time,
old.start_date old_start_date,
old.end_date old_end_date,
new.id new_id,
new.login_name new_login_name,
new.nick_name new_nick_name,
new.name new_name,
new.phone_num new_phone_num,
new.email new_email,
new.user_level new_user_level,
new.birthday new_birthday,
new.gender new_gender,
new.create_time new_create_time,
new.operate_time new_operate_time,
new.start_date new_start_date,
new.end_date new_end_date
from
(
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date
from ${APP}.dim_user_info
where dt='9999-99-99'
and start_date<'$do_date'
)old
full outer join
(
select
id,
login_name,
nick_name,
md5(name) name,
md5(phone_num) phone_num,
md5(email) email,
user_level,
birthday,
gender,
create_time,
operate_time,
'$do_date' start_date,
'9999-99-99' end_date
from ${APP}.ods_user_info
where dt='$do_date'
)new
on old.id=new.id
)
insert overwrite table ${APP}.dim_user_info partition(dt)
select
nvl(new_id,old_id),
nvl(new_login_name,old_login_name),
nvl(new_nick_name,old_nick_name),
nvl(new_name,old_name),
nvl(new_phone_num,old_phone_num),
nvl(new_email,old_email),
nvl(new_user_level,old_user_level),
nvl(new_birthday,old_birthday),
nvl(new_gender,old_gender),
nvl(new_create_time,old_create_time),
nvl(new_operate_time,old_operate_time),
nvl(new_start_date,old_start_date),
nvl(new_end_date,old_end_date),
nvl(new_end_date,old_end_date) dt
from tmp
union all
select
old_id,
old_login_name,
old_nick_name,
old_name,
old_phone_num,
old_email,
old_user_level,
old_birthday,
old_gender,
old_create_time,
old_operate_time,
old_start_date,
cast(date_add('$do_date',-1) as string),
cast(date_add('$do_date',-1) as string) dt
from tmp
where new_id is not null and old_id is not null;
"
dim_sku_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
sku as
(
select
id,
price,
sku_name,
sku_desc,
weight,
is_sale,
spu_id,
category3_id,
tm_id,
create_time
from ${APP}.ods_sku_info
where dt='$do_date'
),
spu as
(
select
id,
spu_name
from ${APP}.ods_spu_info
where dt='$do_date'
),
c3 as
(
select
id,
name,
category2_id
from ${APP}.ods_base_category3
where dt='$do_date'
),
c2 as
(
select
id,
name,
category1_id
from ${APP}.ods_base_category2
where dt='$do_date'
),
c1 as
(
select
id,
name
from ${APP}.ods_base_category1
where dt='$do_date'
),
tm as
(
select
id,
tm_name
from ${APP}.ods_base_trademark
where dt='$do_date'
),
attr as
(
select
sku_id,
collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
from ${APP}.ods_sku_attr_value
where dt='$do_date'
group by sku_id
),
sale_attr as
(
select
sku_id,
collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
from ${APP}.ods_sku_sale_attr_value
where dt='$do_date'
group by sku_id
)
insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date')
select
sku.id,
sku.price,
sku.sku_name,
sku.sku_desc,
sku.weight,
sku.is_sale,
sku.spu_id,
spu.spu_name,
sku.category3_id,
c3.name,
c3.category2_id,
c2.name,
c2.category1_id,
c1.name,
sku.tm_id,
tm.tm_name,
attr.attrs,
sale_attr.sale_attrs,
sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"
dim_base_province="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_base_province
select
bp.id,
bp.name,
bp.area_code,
bp.iso_code,
bp.iso_3166_2,
bp.region_id,
bp.name
from ${APP}.ods_base_province bp
join ${APP}.ods_base_region br on bp.region_id = br.id;
"
dim_coupon_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date')
select
id,
coupon_name,
coupon_type,
condition_amount,
condition_num,
activity_id,
benefit_amount,
benefit_discount,
create_time,
range_type,
limit_num,
taken_count,
start_time,
end_time,
operate_time,
expire_time
from ${APP}.ods_coupon_info
where dt='$do_date';
"
dim_activity_rule_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date')
select
ar.id,
ar.activity_id,
ai.activity_name,
ar.activity_type,
ai.start_time,
ai.end_time,
ai.create_time,
ar.condition_amount,
ar.condition_num,
ar.benefit_amount,
ar.benefit_discount,
ar.benefit_level
from
(
select
id,
activity_id,
activity_type,
condition_amount,
condition_num,
benefit_amount,
benefit_discount,
benefit_level
from ${APP}.ods_activity_rule
where dt='$do_date'
)ar
left join
(
select
id,
activity_name,
start_time,
end_time,
create_time
from ${APP}.ods_activity_info
where dt='$do_date'
)ai
on ar.activity_id=ai.id;
"
case $1 in
"dim_user_info"){
hive -e "$dim_user_info"
};;
"dim_sku_info"){
hive -e "$dim_sku_info"
};;
"dim_base_province"){
hive -e "$dim_base_province"
};;
"dim_coupon_info"){
hive -e "$dim_coupon_info"
};;
"dim_activity_rule_info"){
hive -e "$dim_activity_rule_info"
};;
"all"){
hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info"
};;
esac
2). 增加執行權限:
[xiaobai@hadoop102 bin]$ chmod +x ods_to_dim_db*
3). 執行腳本
ods_to_dim_db.sh all 2020-06-14
六、數倉搭建-DWD層
- 對用戶行為資料決議;
- 對業務資料采用維度模型重新建模,
6.1 DWD層 (用戶行為日志)
6.1.1 日志決議思路
- 日志結構
1). 頁面埋點日志

2). 啟動日志

- 日志決議思路

6.1.2 get_json_object函式使用
- 資料
[{"name":"小明","sex":"男","age":"25"},{"name":"小紅","sex":"女","age":"23"}]
- 取出第一個json物件:
select get_json_object('[{"name":"小明","sex":"男","age":"25"},{"name":"小紅","sex":"女","age":"23"}]','$[0]')
結果:

- 取出第一個json的age欄位的值:
select get_json_object('[{"name":"小明","sex":"男","age":"25"},{"name":"小紅","sex":"女","age":"23"}]','$[0].age')
結果:

6.1.3 啟動日志表
啟動日志決議思路:啟動日志表中每行資料對應一個啟動記錄,一個啟動記錄應該包含日志中的公共資訊和啟動資訊,先將所有包含start欄位的日志過濾出來,然后使用get_json_object函式決議每個欄位,

- 建表陳述句
DROP TABLE IF EXISTS dwd_start_log;
CREATE EXTERNAL TABLE dwd_start_log(
`area_code` STRING COMMENT '地區編碼',
`brand` STRING COMMENT '手機品牌',
`channel` STRING COMMENT '渠道',
`is_new` STRING COMMENT '是否首次啟動',
`model` STRING COMMENT '手機型號',
`mid_id` STRING COMMENT '設備id',
`os` STRING COMMENT '作業系統',
`user_id` STRING COMMENT '會員id',
`version_code` STRING COMMENT 'app版本號',
`entry` STRING COMMENT 'icon手機圖示 notice 通知 install 安裝后啟動',
`loading_time` BIGINT COMMENT '啟動加載時間',
`open_ad_id` STRING COMMENT '廣告頁ID ',
`open_ad_ms` BIGINT COMMENT '廣告總共播放時間',
`open_ad_skip_ms` BIGINT COMMENT '用戶跳過廣告時點',
`ts` BIGINT COMMENT '時間'
) COMMENT '啟動日志表'
PARTITIONED BY (`dt` STRING) -- 按照時間創建磁區
STORED AS PARQUET -- 采用parquet列式存盤
LOCATION '/warehouse/gmall/dwd/dwd_start_log' -- 指定在HDFS上存盤位置
TBLPROPERTIES('parquet.compression'='lzo') -- 采用LZO壓縮;
- 資料裝載

首日與每日相同!
hive (gmall)>
insert overwrite table dwd_start_log partition(dt='2020-06-14')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.start.entry'),
get_json_object(line,'$.start.loading_time'),
get_json_object(line,'$.start.open_ad_id'),
get_json_object(line,'$.start.open_ad_ms'),
get_json_object(line,'$.start.open_ad_skip_ms'),
get_json_object(line,'$.ts')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.start') is not null;
- 查看資料
select * from dwd_start_log where dt='2020-06-14' limit 2;
6.1.4 頁面日志表
**頁面日志決議思路:**頁面日志表中每行資料對應一個頁面訪問記錄,一個頁面訪問記錄應該包含日志中的公共資訊和頁面資訊,先將所有包含page欄位的日志過濾出來,然后使用get_json_object函式決議每個欄位,

- 建表陳述句
DROP TABLE IF EXISTS dwd_page_log;
CREATE EXTERNAL TABLE dwd_page_log(
`area_code` STRING COMMENT '地區編碼',
`brand` STRING COMMENT '手機品牌',
`channel` STRING COMMENT '渠道',
`is_new` STRING COMMENT '是否首次啟動',
`model` STRING COMMENT '手機型號',
`mid_id` STRING COMMENT '設備id',
`os` STRING COMMENT '作業系統',
`user_id` STRING COMMENT '會員id',
`version_code` STRING COMMENT 'app版本號',
`during_time` BIGINT COMMENT '持續時間毫秒',
`page_item` STRING COMMENT '目標id ',
`page_item_type` STRING COMMENT '目標型別',
`last_page_id` STRING COMMENT '上頁型別',
`page_id` STRING COMMENT '頁面ID ',
`source_type` STRING COMMENT '來源型別',
`ts` bigint
) COMMENT '頁面日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_page_log'
TBLPROPERTIES('parquet.compression'='lzo');
- 資料裝載
insert overwrite table dwd_page_log partition(dt='2020-06-14')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.page.during_time'),
get_json_object(line,'$.page.item'),
get_json_object(line,'$.page.item_type'),
get_json_object(line,'$.page.last_page_id'),
get_json_object(line,'$.page.page_id'),
get_json_object(line,'$.page.source_type'),
get_json_object(line,'$.ts')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.page') is not null;
- 查看資料:
select * from dwd_page_log where dt='2020-06-14' limit 2;
6.1.5 動作日志表
動作日志決議思路:動作日志表中每行資料對應用戶的一個動作記錄,一個動作記錄應當包含公共資訊、頁面資訊以及動作資訊,先將包含action欄位的日志過濾出來,然后通過UDTF函式,將action陣列“炸開”(類似于explode函式的效果),然后使用get_json_object函式決議每個欄位,

- 建表陳述句
DROP TABLE IF EXISTS dwd_action_log;
CREATE EXTERNAL TABLE dwd_action_log(
`area_code` STRING COMMENT '地區編碼',
`brand` STRING COMMENT '手機品牌',
`channel` STRING COMMENT '渠道',
`is_new` STRING COMMENT '是否首次啟動',
`model` STRING COMMENT '手機型號',
`mid_id` STRING COMMENT '設備id',
`os` STRING COMMENT '作業系統',
`user_id` STRING COMMENT '會員id',
`version_code` STRING COMMENT 'app版本號',
`during_time` BIGINT COMMENT '持續時間毫秒',
`page_item` STRING COMMENT '目標id ',
`page_item_type` STRING COMMENT '目標型別',
`last_page_id` STRING COMMENT '上頁型別',
`page_id` STRING COMMENT '頁面id ',
`source_type` STRING COMMENT '來源型別',
`action_id` STRING COMMENT '動作id',
`item` STRING COMMENT '目標id ',
`item_type` STRING COMMENT '目標型別',
`ts` BIGINT COMMENT '時間'
) COMMENT '動作日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_action_log'
TBLPROPERTIES('parquet.compression'='lzo');
- 創建UDTF函式——設計思路


- 創建UDTF函式——撰寫代碼
1). 創建一個maven工程:hivefunction
2). 創建包名:com.atguigu.hive.udtf
3). 引入如下依賴
<dependencies>
<!--添加hive依賴-->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>3.1.2</version>
</dependency>
</dependencies>
4). 編碼
package com.xiaobai.gmall.hive.udtf;
import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
import org.json.JSONArray;
public class ExplodeJSONArray extends GenericUDTF {
private PrimitiveObjectInspector inputOI;
@Override
public void process(Object[] args) throws HiveException {
Object arg = args[0];
String jsonArrayStr = PrimitiveObjectInspectorUtils.getString(arg, inputOI);
JSONArray jsonArray = new JSONArray(jsonArrayStr);
for (int i = 0; i < jsonArray.length(); i++) {
String json = jsonArray.getString(i);
String[] result = {json};
forward(result);
}
}
@Override
public void close() throws HiveException {
}
@Override
public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
if(argOIs.length!= 1){
throw new UDFArgumentException("explode_json_array函式只能接收1個函式");
}
ObjectInspector argOI = argOIs[0];
if(argOI.getCategory()!= ObjectInspector.Category.PRIMITIVE){
throw new UDFArgumentException("explode_json_array函式只能接識訓本資料型別的函式");
}
PrimitiveObjectInspector primitiveOI = (PrimitiveObjectInspector) argOI;
inputOI=primitiveOI;
if(primitiveOI.getPrimitiveCategory()!= PrimitiveObjectInspector.PrimitiveCategory.STRING){
throw new UDFArgumentException("explode_json_array函式只能接收STRING型別的函式");
}
ArrayList<String> fieldNames = new ArrayList<String>();
ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
fieldNames.add("item");
fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,fieldOIs);
}
}
- 創建函式
1). 打包
2). 將gmall-udtf-1.0-SNAPSHOT.jar上傳到hadoop102的/opt/module/路徑下,然后再將該jar包上傳到HDFS的/user/hive/jars路徑下:
可手動在hdfs上創建jars目錄并上傳jarbao,也可通過命令上傳:
[xiaobai@hadoop102 module]$ hadoop fs -mkdir -p /user/hive/jars
[xiaobai@hadoop102 module]$ hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars

3). 創建永久函式與開發好的java class關聯:
create function explode_json_array as 'com.xiaobai.gmall.hive.udtf.ExplodeJSONArray' using jar 'hdfs://hadoop102:8020/user/hive/jars/gmall-udtf-1.0-SNAPSHOT.jar';
注??:
如果修改了自定義函式重新生成jar包怎么處理?只需要替換HDFS路徑上的舊jar包,然后重啟Hive客戶端即可,
- 資料匯入
insert overwrite table dwd_action_log partition(dt='2020-06-14')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.page.during_time'),
get_json_object(line,'$.page.item'),
get_json_object(line,'$.page.item_type'),
get_json_object(line,'$.page.last_page_id'),
get_json_object(line,'$.page.page_id'),
get_json_object(line,'$.page.source_type'),
get_json_object(action,'$.action_id'),
get_json_object(action,'$.item'),
get_json_object(action,'$.item_type'),
get_json_object(action,'$.ts')
from ods_log lateral view explode_json_array(get_json_object(line,'$.actions')) tmp as action
where dt='2020-06-14'
and get_json_object(line,'$.actions') is not null;
- 查看資料
select * from dwd_action_log where dt='2020-06-14' limit 2;
6.1.6 曝光日志表
曝光日志決議思路:曝光日志表中每行資料對應一個曝光記錄,一個曝光記錄應當包含公共資訊、頁面資訊以及曝光資訊,先將包含display欄位的日志過濾出來,然后通過UDTF函式,將display陣列“炸開”(類似于explode函式的效果),然后使用get_json_object函式決議每個欄位,

- 建表陳述句
DROP TABLE IF EXISTS dwd_display_log;
CREATE EXTERNAL TABLE dwd_display_log(
`area_code` STRING COMMENT '地區編碼',
`brand` STRING COMMENT '手機品牌',
`channel` STRING COMMENT '渠道',
`is_new` STRING COMMENT '是否首次啟動',
`model` STRING COMMENT '手機型號',
`mid_id` STRING COMMENT '設備id',
`os` STRING COMMENT '作業系統',
`user_id` STRING COMMENT '會員id',
`version_code` STRING COMMENT 'app版本號',
`during_time` BIGINT COMMENT 'app版本號',
`page_item` STRING COMMENT '目標id ',
`page_item_type` STRING COMMENT '目標型別',
`last_page_id` STRING COMMENT '上頁型別',
`page_id` STRING COMMENT '頁面ID ',
`source_type` STRING COMMENT '來源型別',
`ts` BIGINT COMMENT 'app版本號',
`display_type` STRING COMMENT '曝光型別',
`item` STRING COMMENT '曝光物件id ',
`item_type` STRING COMMENT 'app版本號',
`order` BIGINT COMMENT '曝光順序',
`pos_id` BIGINT COMMENT '曝光位置'
) COMMENT '曝光日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_display_log'
TBLPROPERTIES('parquet.compression'='lzo');
- 資料匯入
insert overwrite table dwd_display_log partition(dt='2020-06-14')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.page.during_time'),
get_json_object(line,'$.page.item'),
get_json_object(line,'$.page.item_type'),
get_json_object(line,'$.page.last_page_id'),
get_json_object(line,'$.page.page_id'),
get_json_object(line,'$.page.source_type'),
get_json_object(line,'$.ts'),
get_json_object(display,'$.display_type'),
get_json_object(display,'$.item'),
get_json_object(display,'$.item_type'),
get_json_object(display,'$.order'),
get_json_object(display,'$.pos_id')
from ods_log lateral view explode_json_array(get_json_object(line,'$.displays')) tmp as display
where dt='2020-06-14'
and get_json_object(line,'$.displays') is not null;
- 查看資料
select * from dwd_display_log where dt='2020-06-14' limit 2;
6.1.7 錯誤日志表
錯誤日志決議思路:錯誤日志表中每行資料對應一個錯誤記錄,為方便定位錯誤,一個錯誤記錄應當包含與之對應的公共資訊、頁面資訊、曝光資訊、動作資訊、啟動資訊以及錯誤資訊,先將包含err欄位的日志過濾出來,然后使用get_json_object函式決議所有欄位,

- 建表陳述句
DROP TABLE IF EXISTS dwd_error_log;
CREATE EXTERNAL TABLE dwd_error_log(
`area_code` STRING COMMENT '地區編碼',
`brand` STRING COMMENT '手機品牌',
`channel` STRING COMMENT '渠道',
`is_new` STRING COMMENT '是否首次啟動',
`model` STRING COMMENT '手機型號',
`mid_id` STRING COMMENT '設備id',
`os` STRING COMMENT '作業系統',
`user_id` STRING COMMENT '會員id',
`version_code` STRING COMMENT 'app版本號',
`page_item` STRING COMMENT '目標id ',
`page_item_type` STRING COMMENT '目標型別',
`last_page_id` STRING COMMENT '上頁型別',
`page_id` STRING COMMENT '頁面ID ',
`source_type` STRING COMMENT '來源型別',
`entry` STRING COMMENT ' icon手機圖示 notice 通知 install 安裝后啟動',
`loading_time` STRING COMMENT '啟動加載時間',
`open_ad_id` STRING COMMENT '廣告頁ID ',
`open_ad_ms` STRING COMMENT '廣告總共播放時間',
`open_ad_skip_ms` STRING COMMENT '用戶跳過廣告時點',
`actions` STRING COMMENT '動作',
`displays` STRING COMMENT '曝光',
`ts` STRING COMMENT '時間',
`error_code` STRING COMMENT '錯誤碼',
`msg` STRING COMMENT '錯誤資訊'
) COMMENT '錯誤日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_error_log'
TBLPROPERTIES('parquet.compression'='lzo');
注??:
此處為對動作陣列和曝光陣列做處理,如需分析錯誤與單個動作或曝光的關聯,可先使用explode_json_array函式將陣列“炸開”,再使用get_json_object函式獲取具體欄位,
- 資料裝載
insert overwrite table dwd_error_log partition(dt='2020-06-14')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.page.item'),
get_json_object(line,'$.page.item_type'),
get_json_object(line,'$.page.last_page_id'),
get_json_object(line,'$.page.page_id'),
get_json_object(line,'$.page.source_type'),
get_json_object(line,'$.start.entry'),
get_json_object(line,'$.start.loading_time'),
get_json_object(line,'$.start.open_ad_id'),
get_json_object(line,'$.start.open_ad_ms'),
get_json_object(line,'$.start.open_ad_skip_ms'),
get_json_object(line,'$.actions'),
get_json_object(line,'$.displays'),
get_json_object(line,'$.ts'),
get_json_object(line,'$.err.error_code'),
get_json_object(line,'$.err.msg')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.err') is not null;
6.1.8 DWD層用戶行為資料加載腳本
- 撰寫腳本
在hadoop102的/home/xiaobai/bin目錄下創建腳本
[xiaobai@hadoop102 bin]$ vim ods_to_dwd_log.sh
填入以下內容:
#!/bin/bash
APP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;then
do_date=$2
else
do_date=`date -d "-1 day" +%F`
fi
dwd_start_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_start_log partition(dt='$do_date')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.start.entry'),
get_json_object(line,'$.start.loading_time'),
get_json_object(line,'$.start.open_ad_id'),
get_json_object(line,'$.start.open_ad_ms'),
get_json_object(line,'$.start.open_ad_skip_ms'),
get_json_object(line,'$.ts')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.start') is not null;"
dwd_page_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_page_log partition(dt='$do_date')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.page.during_time'),
get_json_object(line,'$.page.item'),
get_json_object(line,'$.page.item_type'),
get_json_object(line,'$.page.last_page_id'),
get_json_object(line,'$.page.page_id'),
get_json_object(line,'$.page.source_type'),
get_json_object(line,'$.ts')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.page') is not null;"
dwd_action_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_action_log partition(dt='$do_date')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.page.during_time'),
get_json_object(line,'$.page.item'),
get_json_object(line,'$.page.item_type'),
get_json_object(line,'$.page.last_page_id'),
get_json_object(line,'$.page.page_id'),
get_json_object(line,'$.page.source_type'),
get_json_object(action,'$.action_id'),
get_json_object(action,'$.item'),
get_json_object(action,'$.item_type'),
get_json_object(action,'$.ts')
from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.actions')) tmp as action
where dt='$do_date'
and get_json_object(line,'$.actions') is not null;"
dwd_display_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_display_log partition(dt='$do_date')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.page.during_time'),
get_json_object(line,'$.page.item'),
get_json_object(line,'$.page.item_type'),
get_json_object(line,'$.page.last_page_id'),
get_json_object(line,'$.page.page_id'),
get_json_object(line,'$.page.source_type'),
get_json_object(line,'$.ts'),
get_json_object(display,'$.display_type'),
get_json_object(display,'$.item'),
get_json_object(display,'$.item_type'),
get_json_object(display,'$.order'),
get_json_object(display,'$.pos_id')
from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.displays')) tmp as display
where dt='$do_date'
and get_json_object(line,'$.displays') is not null;"
dwd_error_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_error_log partition(dt='$do_date')
select
get_json_object(line,'$.common.ar'),
get_json_object(line,'$.common.ba'),
get_json_object(line,'$.common.ch'),
get_json_object(line,'$.common.is_new'),
get_json_object(line,'$.common.md'),
get_json_object(line,'$.common.mid'),
get_json_object(line,'$.common.os'),
get_json_object(line,'$.common.uid'),
get_json_object(line,'$.common.vc'),
get_json_object(line,'$.page.item'),
get_json_object(line,'$.page.item_type'),
get_json_object(line,'$.page.last_page_id'),
get_json_object(line,'$.page.page_id'),
get_json_object(line,'$.page.source_type'),
get_json_object(line,'$.start.entry'),
get_json_object(line,'$.start.loading_time'),
get_json_object(line,'$.start.open_ad_id'),
get_json_object(line,'$.start.open_ad_ms'),
get_json_object(line,'$.start.open_ad_skip_ms'),
get_json_object(line,'$.actions'),
get_json_object(line,'$.displays'),
get_json_object(line,'$.ts'),
get_json_object(line,'$.err.error_code'),
get_json_object(line,'$.err.msg')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.err') is not null;"
case $1 in
dwd_start_log )
hive -e "$dwd_start_log"
;;
dwd_page_log )
hive -e "$dwd_page_log"
;;
dwd_action_log )
hive -e "$dwd_action_log"
;;
dwd_display_log )
hive -e "$dwd_display_log"
;;
dwd_error_log )
hive -e "$dwd_error_log"
;;
all )
hive -e "$dwd_start_log$dwd_page_log$dwd_action_log$dwd_display_log$dwd_error_log"
;;
esac
- 增加腳本執行權限
[xiaobai@hadoop102 bin]$ chmod +x ods_to_dwd_log.sh
- 執行腳本
[xiaobai@hadoop102 bin]$ ./ods_to_dwd_log.sh all 2020-06-14
查看匯入結果:
6.2 DWD層(業務資料)
業務資料方面DWD層的搭建主要注意點在于維度建模,
6.2.1 評價事實表(事務型事實表)
- 建表陳述句
DROP TABLE IF EXISTS dwd_comment_info;
CREATE EXTERNAL TABLE dwd_comment_info(
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶ID',
`sku_id` STRING COMMENT '商品sku',
`spu_id` STRING COMMENT '商品spu',
`order_id` STRING COMMENT '訂單ID',
`appraise` STRING COMMENT '評價(好評、中評、差評、默認評價)',
`create_time` STRING COMMENT '評價時間'
) COMMENT '評價事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_comment_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
-
磁區規劃

-
資料裝載

1). 首日裝載
insert overwrite table dwd_comment_info partition (dt)
select
id,
user_id,
sku_id,
spu_id,
order_id,
appraise,
create_time,
date_format(create_time,'yyyy-MM-dd')
from ods_comment_info
where dt='2020-06-14';
2). 每日裝載
insert overwrite table dwd_comment_info partition(dt='2020-06-14')
select
id,
user_id,
sku_id,
spu_id,
order_id,
appraise,
create_time
from ods_comment_info
where dt='2020-06-15'
6.2.2 訂單明細事實表(事務型事實表)
- 建表陳述句
DROP TABLE IF EXISTS dwd_order_detail;
CREATE EXTERNAL TABLE dwd_order_detail (
`id` STRING COMMENT '訂單編號',
`order_id` STRING COMMENT '訂單號',
`user_id` STRING COMMENT '用戶id',
`sku_id` STRING COMMENT 'sku商品id',
`province_id` STRING COMMENT '省份ID',
`activity_id` STRING COMMENT '活動ID',
`activity_rule_id` STRING COMMENT '活動規則ID',
`coupon_id` STRING COMMENT '優惠券ID',
`create_time` STRING COMMENT '創建時間',
`source_type` STRING COMMENT '來源型別',
`source_id` STRING COMMENT '來源編號',
`sku_num` BIGINT COMMENT '商品數量',
`original_amount` DECIMAL(16,2) COMMENT '原始價格',
`split_activity_amount` DECIMAL(16,2) COMMENT '活動優惠分攤',
`split_coupon_amount` DECIMAL(16,2) COMMENT '優惠券優惠分攤',
`split_final_amount` DECIMAL(16,2) COMMENT '最終價格分攤'
) COMMENT '訂單明細事實表表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_detail/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 磁區規劃

- 資料裝載

1). 首日裝載
insert overwrite table dwd_order_detail partition(dt)
select
od.id,
od.order_id,
oi.user_id,
od.sku_id,
oi.province_id,
oda.activity_id,
oda.activity_rule_id,
odc.coupon_id,
od.create_time,
od.source_type,
od.source_id,
od.sku_num,
od.order_price*od.sku_num,
od.split_activity_amount,
od.split_coupon_amount,
od.split_final_amount,
date_format(create_time,'yyyy-MM-dd')
from
(
select
*
from ods_order_detail
where dt='2020-06-14'
)od
left join
(
select
id,
user_id,
province_id
from ods_order_info
where dt='2020-06-14'
)oi
on od.order_id=oi.id
left join
(
select
order_detail_id,
activity_id,
activity_rule_id
from ods_order_detail_activity
where dt='2020-06-14'
)oda
on od.id=oda.order_detail_id
left join
(
select
order_detail_id,
coupon_id
from ods_order_detail_coupon
where dt='2020-06-14'
)odc
on od.id=odc.order_detail_id;
2). 每日裝載
insert overwrite table dwd_order_detail partition(dt='2020-06-15')
select
od.id,
od.order_id,
oi.user_id,
od.sku_id,
oi.province_id,
oda.activity_id,
oda.activity_rule_id,
odc.coupon_id,
od.create_time,
od.source_type,
od.source_id,
od.sku_num,
od.order_price*od.sku_num,
od.split_activity_amount,
od.split_coupon_amount,
od.split_final_amount
from
(
select
*
from ods_order_detail
where dt='2020-06-15'
)od
left join
(
select
id,
user_id,
province_id
from ods_order_info
where dt='2020-06-15'
)oi
on od.order_id=oi.id
left join
(
select
order_detail_id,
activity_id,
activity_rule_id
from ods_order_detail_activity
where dt='2020-06-15'
)oda
on od.id=oda.order_detail_id
left join
(
select
order_detail_id,
coupon_id
from ods_order_detail_coupon
where dt='2020-06-15'
)odc
on od.id=odc.order_detail_id;
6.2.3 退單事實表(事務型事實表)
- 建表陳述句
DROP TABLE IF EXISTS dwd_order_refund_info;
CREATE EXTERNAL TABLE dwd_order_refund_info(
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶ID',
`order_id` STRING COMMENT '訂單ID',
`sku_id` STRING COMMENT '商品ID',
`province_id` STRING COMMENT '地區ID',
`refund_type` STRING COMMENT '退單型別',
`refund_num` BIGINT COMMENT '退單件數',
`refund_amount` DECIMAL(16,2) COMMENT '退單金額',
`refund_reason_type` STRING COMMENT '退單原因型別',
`create_time` STRING COMMENT '退單時間'
) COMMENT '退單事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_refund_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 磁區規劃

- 資料裝載

1). 首日裝載
insert overwrite table dwd_order_refund_info partition(dt)
select
ri.id,
ri.user_id,
ri.order_id,
ri.sku_id,
oi.province_id,
ri.refund_type,
ri.refund_num,
ri.refund_amount,
ri.refund_reason_type,
ri.create_time,
date_format(ri.create_time,'yyyy-MM-dd')
from
(
select * from ods_order_refund_info where dt='2020-06-14'
)ri
left join
(
select id,province_id from ods_order_info where dt='2020-06-14'
)oi
on ri.order_id=oi.id;
2). 每日裝載
insert overwrite table dwd_order_refund_info partition(dt='2020-06-15')
select
ri.id,
ri.user_id,
ri.order_id,
ri.sku_id,
oi.province_id,
ri.refund_type,
ri.refund_num,
ri.refund_amount,
ri.refund_reason_type,
ri.create_time
from
(
select * from ods_order_refund_info where dt='2020-06-15'
)ri
left join
(
select id,province_id from ods_order_info where dt='2020-06-15'
)oi
on ri.order_id=oi.id;
6.2.4 加購事實表(周期型快照事實表,每日快照)
- 建表陳述句
DROP TABLE IF EXISTS dwd_cart_info;
CREATE EXTERNAL TABLE dwd_cart_info(
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶ID',
`sku_id` STRING COMMENT '商品ID',
`source_type` STRING COMMENT '來源型別',
`source_id` STRING COMMENT '來源編號',
`cart_price` DECIMAL(16,2) COMMENT '加入購物車時的價格',
`is_ordered` STRING COMMENT '是否已下單',
`create_time` STRING COMMENT '創建時間',
`operate_time` STRING COMMENT '修改時間',
`order_time` STRING COMMENT '下單時間',
`sku_num` BIGINT COMMENT '加購數量'
) COMMENT '加購事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_cart_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 磁區規劃

- 資料裝載

1). 首日裝載
insert overwrite table dwd_cart_info partition(dt='2020-06-14')
select
id,
user_id,
sku_id,
source_type,
source_id,
cart_price,
is_ordered,
create_time,
operate_time,
order_time,
sku_num
from ods_cart_info
where dt='2020-06-14';
2). 每日裝載
insert overwrite table dwd_cart_info partition(dt='2020-06-15')
select
id,
user_id,
sku_id,
source_type,
source_id,
cart_price,
is_ordered,
create_time,
operate_time,
order_time,
sku_num
from ods_cart_info
where dt='2020-06-15';
6.2.5 收藏事實表(周期型快照事實表,每日快照)
- 建表陳述句
DROP TABLE IF EXISTS dwd_favor_info;
CREATE EXTERNAL TABLE dwd_favor_info(
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶id',
`sku_id` STRING COMMENT 'skuid',
`spu_id` STRING COMMENT 'spuid',
`is_cancel` STRING COMMENT '是否取消',
`create_time` STRING COMMENT '收藏時間',
`cancel_time` STRING COMMENT '取消時間'
) COMMENT '收藏事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_favor_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
-
磁區規劃

-
資料裝載

1).首日裝載
insert overwrite table dwd_favor_info partition(dt='2020-06-14')
select
id,
user_id,
sku_id,
spu_id,
is_cancel,
create_time,
cancel_time
from ods_favor_info
where dt='2020-06-14';
2). 每日裝載
insert overwrite table dwd_favor_info partition(dt='2020-06-15')
select
id,
user_id,
sku_id,
spu_id,
is_cancel,
create_time,
cancel_time
from ods_favor_info
where dt='2020-06-15';
6.2.6 優惠券領用事實表(累積型快照事實表)
- 建表陳述句
DROP TABLE IF EXISTS dwd_coupon_use;
CREATE EXTERNAL TABLE dwd_coupon_use(
`id` STRING COMMENT '編號',
`coupon_id` STRING COMMENT '優惠券ID',
`user_id` STRING COMMENT 'userid',
`order_id` STRING COMMENT '訂單id',
`coupon_status` STRING COMMENT '優惠券狀態',
`get_time` STRING COMMENT '領取時間',
`using_time` STRING COMMENT '使用時間(下單)',
`used_time` STRING COMMENT '使用時間(支付)',
`expire_time` STRING COMMENT '過期時間'
) COMMENT '優惠券領用事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_coupon_use/'
TBLPROPERTIES ("parquet.compression"="lzo");
-
磁區規劃

-
資料裝載

1). 首日裝載
insert overwrite table dwd_coupon_use partition(dt)
select
id,
coupon_id,
user_id,
order_id,
coupon_status,
get_time,
using_time,
used_time,
expire_time,
coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
from ods_coupon_use
where dt='2020-06-14';
2). 每日裝載
a.裝載邏輯

b.轉載陳述句
insert overwrite table dwd_coupon_use partition(dt)
select
nvl(new.id,old.id),
nvl(new.coupon_id,old.coupon_id),
nvl(new.user_id,old.user_id),
nvl(new.order_id,old.order_id),
nvl(new.coupon_status,old.coupon_status),
nvl(new.get_time,old.get_time),
nvl(new.using_time,old.using_time),
nvl(new.used_time,old.used_time),
nvl(new.expire_time,old.expire_time),
coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
from
(
select
id,
coupon_id,
user_id,
order_id,
coupon_status,
get_time,
using_time,
used_time,
expire_time
from dwd_coupon_use
where dt='9999-99-99'
)old
full outer join
(
select
id,
coupon_id,
user_id,
order_id,
coupon_status,
get_time,
using_time,
used_time,
expire_time
from ods_coupon_use
where dt='2020-06-15'
)new
on old.id=new.id;
6.2.7 支付事實表(累積型快照事實表)
- 建表陳述句
DROP TABLE IF EXISTS dwd_payment_info;
CREATE EXTERNAL TABLE dwd_payment_info (
`id` STRING COMMENT '編號',
`order_id` STRING COMMENT '訂單編號',
`user_id` STRING COMMENT '用戶編號',
`province_id` STRING COMMENT '地區ID',
`trade_no` STRING COMMENT '交易編號',
`out_trade_no` STRING COMMENT '對外交易編號',
`payment_type` STRING COMMENT '支付型別',
`payment_amount` DECIMAL(16,2) COMMENT '支付金額',
`payment_status` STRING COMMENT '支付狀態',
`create_time` STRING COMMENT '創建時間',--呼叫第三方支付介面的時間
`callback_time` STRING COMMENT '完成時間'--支付完成時間,即支付成功回呼時間
) COMMENT '支付事實表表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_payment_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
-
磁區規劃

-
資料裝載

1). 首日裝載
insert overwrite table dwd_payment_info partition(dt)
select
pi.id,
pi.order_id,
pi.user_id,
oi.province_id,
pi.trade_no,
pi.out_trade_no,
pi.payment_type,
pi.payment_amount,
pi.payment_status,
pi.create_time,
pi.callback_time,
nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(
select * from ods_payment_info where dt='2020-06-14'
)pi
left join
(
select id,province_id from ods_order_info where dt='2020-06-14'
)oi
on pi.order_id=oi.id;
2). 每日裝載
insert overwrite table dwd_payment_info partition(dt)
select
nvl(new.id,old.id),
nvl(new.order_id,old.order_id),
nvl(new.user_id,old.user_id),
nvl(new.province_id,old.province_id),
nvl(new.trade_no,old.trade_no),
nvl(new.out_trade_no,old.out_trade_no),
nvl(new.payment_type,old.payment_type),
nvl(new.payment_amount,old.payment_amount),
nvl(new.payment_status,old.payment_status),
nvl(new.create_time,old.create_time),
nvl(new.callback_time,old.callback_time),
nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(
select id,
order_id,
user_id,
province_id,
trade_no,
out_trade_no,
payment_type,
payment_amount,
payment_status,
create_time,
callback_time
from dwd_payment_info
where dt = '9999-99-99'
)old
full outer join
(
select
pi.id,
pi.out_trade_no,
pi.order_id,
pi.user_id,
oi.province_id,
pi.payment_type,
pi.trade_no,
pi.payment_amount,
pi.payment_status,
pi.create_time,
pi.callback_time
from
(
select * from ods_payment_info where dt='2020-06-15'
)pi
left join
(
select id,province_id from ods_order_info where dt='2020-06-15'
)oi
on pi.order_id=oi.id
)new
on old.id=new.id;
6.2.8 退款事實表(累積型快照事實表)
- 建表陳述句
DROP TABLE IF EXISTS dwd_refund_payment;
CREATE EXTERNAL TABLE dwd_refund_payment (
`id` STRING COMMENT '編號',
`user_id` STRING COMMENT '用戶ID',
`order_id` STRING COMMENT '訂單編號',
`sku_id` STRING COMMENT 'SKU編號',
`province_id` STRING COMMENT '地區ID',
`trade_no` STRING COMMENT '交易編號',
`out_trade_no` STRING COMMENT '對外交易編號',
`payment_type` STRING COMMENT '支付型別',
`refund_amount` DECIMAL(16,2) COMMENT '退款金額',
`refund_status` STRING COMMENT '退款狀態',
`create_time` STRING COMMENT '創建時間',--呼叫第三方支付介面的時間
`callback_time` STRING COMMENT '回呼時間'--支付介面回呼時間,即支付成功時間
) COMMENT '退款事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_refund_payment/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 磁區規劃

- 資料裝載

1). 首日裝載
insert overwrite table dwd_refund_payment partition(dt)
select
rp.id,
user_id,
order_id,
sku_id,
province_id,
trade_no,
out_trade_no,
payment_type,
refund_amount,
refund_status,
create_time,
callback_time,
nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(
select
id,
out_trade_no,
order_id,
sku_id,
payment_type,
trade_no,
refund_amount,
refund_status,
create_time,
callback_time
from ods_refund_payment
where dt='2020-06-14'
)rp
left join
(
select
id,
user_id,
province_id
from ods_order_info
where dt='2020-06-14'
)oi
on rp.order_id=oi.id;
2). 每日裝載
insert overwrite table dwd_refund_payment partition(dt)
select
nvl(new.id,old.id),
nvl(new.user_id,old.user_id),
nvl(new.order_id,old.order_id),
nvl(new.sku_id,old.sku_id),
nvl(new.province_id,old.province_id),
nvl(new.trade_no,old.trade_no),
nvl(new.out_trade_no,old.out_trade_no),
nvl(new.payment_type,old.payment_type),
nvl(new.refund_amount,old.refund_amount),
nvl(new.refund_status,old.refund_status),
nvl(new.create_time,old.create_time),
nvl(new.callback_time,old.callback_time),
nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(
select
id,
user_id,
order_id,
sku_id,
province_id,
trade_no,
out_trade_no,
payment_type,
refund_amount,
refund_status,
create_time,
callback_time
from dwd_refund_payment
where dt='9999-99-99'
)old
full outer join
(
select
rp.id,
user_id,
order_id,
sku_id,
province_id,
trade_no,
out_trade_no,
payment_type,
refund_amount,
refund_status,
create_time,
callback_time
from
(
select
id,
out_trade_no,
order_id,
sku_id,
payment_type,
trade_no,
refund_amount,
refund_status,
create_time,
callback_time
from ods_refund_payment
where dt='2020-06-15'
)rp
left join
(
select
id,
user_id,
province_id
from ods_order_info
where dt='2020-06-15'
)oi
on rp.order_id=oi.id
)new
on old.id=new.id;
6.2.9 訂單事實表(累積型快照事實表)
- 建表陳述句
DROP TABLE IF EXISTS dwd_order_info;
CREATE EXTERNAL TABLE dwd_order_info(
`id` STRING COMMENT '編號',
`order_status` STRING COMMENT '訂單狀態',
`user_id` STRING COMMENT '用戶ID',
`province_id` STRING COMMENT '地區ID',
`payment_way` STRING COMMENT '支付方式',
`delivery_address` STRING COMMENT '郵寄地址',
`out_trade_no` STRING COMMENT '對外交易編號',
`tracking_no` STRING COMMENT '物流單號',
`create_time` STRING COMMENT '創建時間(未支付狀態)',
`payment_time` STRING COMMENT '支付時間(已支付狀態)',
`cancel_time` STRING COMMENT '取消時間(已取消狀態)',
`finish_time` STRING COMMENT '完成時間(已完成狀態)',
`refund_time` STRING COMMENT '退款時間(退款中狀態)',
`refund_finish_time` STRING COMMENT '退款完成時間(退款完成狀態)',
`expire_time` STRING COMMENT '過期時間',
`feight_fee` DECIMAL(16,2) COMMENT '運費',
`feight_fee_reduce` DECIMAL(16,2) COMMENT '運費減免',
`activity_reduce_amount` DECIMAL(16,2) COMMENT '活動減免',
`coupon_reduce_amount` DECIMAL(16,2) COMMENT '優惠券減免',
`original_amount` DECIMAL(16,2) COMMENT '訂單原始價格',
`final_amount` DECIMAL(16,2) COMMENT '訂單最終價格'
) COMMENT '訂單事實表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_info/'
TBLPROPERTIES ("parquet.compression"="lzo");
- 磁區規劃

3. 資料裝載

1). 首日裝載
insert overwrite table dwd_order_info partition(dt)
select
oi.id,
oi.order_status,
oi.user_id,
oi.province_id,
oi.payment_way,
oi.delivery_address,
oi.out_trade_no,
oi.tracking_no,
oi.create_time,
times.ts['1002'] payment_time,
times.ts['1003'] cancel_time,
times.ts['1004'] finish_time,
times.ts['1005'] refund_time,
times.ts['1006'] refund_finish_time,
oi.expire_time,
feight_fee,
feight_fee_reduce,
activity_reduce_amount,
coupon_reduce_amount,
original_amount,
final_amount,
case
when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')
when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='2020-06-14' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)
when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')
when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')
else '9999-99-99'
end
from
(
select
*
from ods_order_info
where dt='2020-06-14'
)oi
left join
(
select
order_id,
str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
from ods_order_status_log
where dt='2020-06-14'
group by order_id
)times
on oi.id=times.order_id;
2). 每日裝載
insert overwrite table dwd_order_info partition(dt)
select
nvl(new.id,old.id),
nvl(new.order_status,old.order_status),
nvl(new.user_id,old.user_id),
nvl(new.province_id,old.province_id),
nvl(new.payment_way,old.payment_way),
nvl(new.delivery_address,old.delivery_address),
nvl(new.out_trade_no,old.out_trade_no),
nvl(new.tracking_no,old.tracking_no),
nvl(new.create_time,old.create_time),
nvl(new.payment_time,old.payment_time),
nvl(new.cancel_time,old.cancel_time),
nvl(new.finish_time,old.finish_time),
nvl(new.refund_time,old.refund_time),
nvl(new.refund_finish_time,old.refund_finish_time),
nvl(new.expire_time,old.expire_time),
nvl(new.feight_fee,old.feight_fee),
nvl(new.feight_fee_reduce,old.feight_fee_reduce),
nvl(new.activity_reduce_amount,old.activity_reduce_amount),
nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),
nvl(new.original_amount,old.original_amount),
nvl(new.final_amount,old.final_amount),
case
when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')
when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='2020-06-15' and new.refund_time is null then '2020-06-15'
when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')
when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')
else '9999-99-99'
end
from
(
select
id,
order_status,
user_id,
province_id,
payment_way,
delivery_address,
out_trade_no,
tracking_no,
create_time,
payment_time,
cancel_time,
finish_time,
refund_time,
refund_finish_time,
expire_time,
feight_fee,
feight_fee_reduce,
activity_reduce_amount,
coupon_reduce_amount,
original_amount,
final_amount
from dwd_order_info
where dt='9999-99-99'
)old
full outer join
(
select
oi.id,
oi.order_status,
oi.user_id,
oi.province_id,
oi.payment_way,
oi.delivery_address,
oi.out_trade_no,
oi.tracking_no,
oi.create_time,
times.ts['1002'] payment_time,
times.ts['1003'] cancel_time,
times.ts['1004'] finish_time,
times.ts['1005'] refund_time,
times.ts['1006'] refund_finish_time,
oi.expire_time,
feight_fee,
feight_fee_reduce,
activity_reduce_amount,
coupon_reduce_amount,
original_amount,
final_amount
from
(
select
*
from ods_order_info
where dt='2020-06-15'
)oi
left join
(
select
order_id,
str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
from ods_order_status_log
where dt='2020-06-15'
group by order_id
)times
on oi.id=times.order_id
)new
on old.id=new.id;
6.2.10 DWD層業務資料首日裝載腳本
- 在/home/xiaobai/bin目錄下創建腳本
ods_to_dwd_db_init.sh
[xiaobai@hadoop102 bin]$ vim ods_to_dwd_db_init.sh
#!/bin/bash
APP=gmall
if [ -n "$2" ] ;then
do_date=$2
else
echo "請傳入日期引數"
exit
fi
dwd_order_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_info partition(dt)
select
oi.id,
oi.order_status,
oi.user_id,
oi.province_id,
oi.payment_way,
oi.delivery_address,
oi.out_trade_no,
oi.tracking_no,
oi.create_time,
times.ts['1002'] payment_time,
times.ts['1003'] cancel_time,
times.ts['1004'] finish_time,
times.ts['1005'] refund_time,
times.ts['1006'] refund_finish_time,
oi.expire_time,
feight_fee,
feight_fee_reduce,
activity_reduce_amount,
coupon_reduce_amount,
original_amount,
final_amount,
case
when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')
when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='$do_date' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)
when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')
when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')
else '9999-99-99'
end
from
(
select
*
from ${APP}.ods_order_info
where dt='$do_date'
)oi
left join
(
select
order_id,
str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
from ${APP}.ods_order_status_log
where dt='$do_date'
group by order_id
)times
on oi.id=times.order_id;"
dwd_order_detail="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_detail partition(dt)
select
od.id,
od.order_id,
oi.user_id,
od.sku_id,
oi.province_id,
oda.activity_id,
oda.activity_rule_id,
odc.coupon_id,
od.create_time,
od.source_type,
od.source_id,
od.sku_num,
od.order_price*od.sku_num,
od.split_activity_amount,
od.split_coupon_amount,
od.split_final_amount,
date_format(create_time,'yyyy-MM-dd')
from
(
select
*
from ${APP}.ods_order_detail
where dt='$do_date'
)od
left join
(
select
id,
user_id,
province_id
from ${APP}.ods_order_info
where dt='$do_date'
)oi
on od.order_id=oi.id
left join
(
select
order_detail_id,
activity_id,
activity_rule_id
from ${APP}.ods_order_detail_activity
where dt='$do_date'
)oda
on od.id=oda.order_detail_id
left join
(
select
order_detail_id,
coupon_id
from ${APP}.ods_order_detail_coupon
where dt='$do_date'
)odc
on od.id=odc.order_detail_id;"
dwd_payment_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_payment_info partition(dt)
select
pi.id,
pi.order_id,
pi.user_id,
oi.province_id,
pi.trade_no,
pi.out_trade_no,
pi.payment_type,
pi.payment_amount,
pi.payment_status,
pi.create_time,
pi.callback_time,
nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(
select * from ${APP}.ods_payment_info where dt='$do_date'
)pi
left join
(
select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on pi.order_id=oi.id;"
dwd_cart_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
select
id,
user_id,
sku_id,
source_type,
source_id,
cart_price,
is_ordered,
create_time,
operate_time,
order_time,
sku_num
from ${APP}.ods_cart_info
where dt='$do_date';"
dwd_comment_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_comment_info partition(dt)
select
id,
user_id,
sku_id,
spu_id,
order_id,
appraise,
create_time,
date_format(create_time,'yyyy-MM-dd')
from ${APP}.ods_comment_info
where dt='$do_date';
"
dwd_favor_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
select
id,
user_id,
sku_id,
spu_id,
is_cancel,
create_time,
cancel_time
from ${APP}.ods_favor_info
where dt='$do_date';"
dwd_coupon_use="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_coupon_use partition(dt)
select
id,
coupon_id,
user_id,
order_id,
coupon_status,
get_time,
using_time,
used_time,
expire_time,
coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
from ${APP}.ods_coupon_use
where dt='$do_date';"
dwd_order_refund_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_refund_info partition(dt)
select
ri.id,
ri.user_id,
ri.order_id,
ri.sku_id,
oi.province_id,
ri.refund_type,
ri.refund_num,
ri.refund_amount,
ri.refund_reason_type,
ri.create_time,
date_format(ri.create_time,'yyyy-MM-dd')
from
(
select * from ${APP}.ods_order_refund_info where dt='$do_date'
)ri
left join
(
select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on ri.order_id=oi.id;"
dwd_refund_payment="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_refund_payment partition(dt)
select
rp.id,
user_id,
order_id,
sku_id,
province_id,
trade_no,
out_trade_no,
payment_type,
refund_amount,
refund_status,
create_time,
callback_time,
nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(
select
id,
out_trade_no,
order_id,
sku_id,
payment_type,
trade_no,
refund_amount,
refund_status,
create_time,
callback_time
from ${APP}.ods_refund_payment
where dt='$do_date'
)rp
left join
(
select
id,
user_id,
province_id
from ${APP}.ods_order_info
where dt='$do_date'
)oi
on rp.order_id=oi.id;"
case $1 in
dwd_order_info )
hive -e "$dwd_order_info"
;;
dwd_order_detail )
hive -e "$dwd_order_detail"
;;
dwd_payment_info )
hive -e "$dwd_payment_info"
;;
dwd_cart_info )
hive -e "$dwd_cart_info"
;;
dwd_comment_info )
hive -e "$dwd_comment_info"
;;
dwd_favor_info )
hive -e "$dwd_favor_info"
;;
dwd_coupon_use )
hive -e "$dwd_coupon_use"
;;
dwd_order_refund_info )
hive -e "$dwd_order_refund_info"
;;
dwd_refund_payment )
hive -e "$dwd_refund_payment"
;;
all )
hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment"
;;
esac
- 權限
[xiaobai@hadoop102 bin]$ chmod +x ods_to_dwd_db_init.sh
- 執行腳本
ods_to_dwd_db_init.sh all 2020-06-14
6.2.11 DWD層業務資料每日裝載腳本
- 撰寫腳本
在/home/xiaobai/bin目錄下創建腳本ods_to_dwd_db.sh
[xiaobai@hadoop102 bin]$ vim ods_to_dwd_db.sh
#!/bin/bash
APP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;then
do_date=$2
else
do_date=`date -d "-1 day" +%F`
fi
# 假設某累積型快照事實表,某天所有的業務記錄全部完成,則會導致9999-99-99磁區的資料未被覆寫,從而導致資料重復,該函式根據9999-99-99磁區的資料的末次修改時間判斷其是否被覆寫了,如果未被覆寫,就手動清理
clear_data(){
current_date=`date +%F`
current_date_timestamp=`date -d "$current_date" +%s`
last_modified_date=`hadoop fs -ls /warehouse/gmall/dwd/$1 | grep '9999-99-99' | awk '{print $6}'`
last_modified_date_timestamp=`date -d "$last_modified_date" +%s`
if [[ $last_modified_date_timestamp -lt $current_date_timestamp ]]; then
echo "clear table $1 partition(dt=9999-99-99)"
hadoop fs -rm -r -f /warehouse/gmall/dwd/$1/dt=9999-99-99/*
fi
}
dwd_order_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_order_info partition(dt)
select
nvl(new.id,old.id),
nvl(new.order_status,old.order_status),
nvl(new.user_id,old.user_id),
nvl(new.province_id,old.province_id),
nvl(new.payment_way,old.payment_way),
nvl(new.delivery_address,old.delivery_address),
nvl(new.out_trade_no,old.out_trade_no),
nvl(new.tracking_no,old.tracking_no),
nvl(new.create_time,old.create_time),
nvl(new.payment_time,old.payment_time),
nvl(new.cancel_time,old.cancel_time),
nvl(new.finish_time,old.finish_time),
nvl(new.refund_time,old.refund_time),
nvl(new.refund_finish_time,old.refund_finish_time),
nvl(new.expire_time,old.expire_time),
nvl(new.feight_fee,old.feight_fee),
nvl(new.feight_fee_reduce,old.feight_fee_reduce),
nvl(new.activity_reduce_amount,old.activity_reduce_amount),
nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),
nvl(new.original_amount,old.original_amount),
nvl(new.final_amount,old.final_amount),
case
when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')
when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='$do_date' and new.refund_time is null then '$do_date'
when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')
when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')
else '9999-99-99'
end
from
(
select
id,
order_status,
user_id,
province_id,
payment_way,
delivery_address,
out_trade_no,
tracking_no,
create_time,
payment_time,
cancel_time,
finish_time,
refund_time,
refund_finish_time,
expire_time,
feight_fee,
feight_fee_reduce,
activity_reduce_amount,
coupon_reduce_amount,
original_amount,
final_amount
from ${APP}.dwd_order_info
where dt='9999-99-99'
)old
full outer join
(
select
oi.id,
oi.order_status,
oi.user_id,
oi.province_id,
oi.payment_way,
oi.delivery_address,
oi.out_trade_no,
oi.tracking_no,
oi.create_time,
times.ts['1002'] payment_time,
times.ts['1003'] cancel_time,
times.ts['1004'] finish_time,
times.ts['1005'] refund_time,
times.ts['1006'] refund_finish_time,
oi.expire_time,
feight_fee,
feight_fee_reduce,
activity_reduce_amount,
coupon_reduce_amount,
original_amount,
final_amount
from
(
select
*
from ${APP}.ods_order_info
where dt='$do_date'
)oi
left join
(
select
order_id,
str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
from ${APP}.ods_order_status_log
where dt='$do_date'
group by order_id
)times
on oi.id=times.order_id
)new
on old.id=new.id;"
dwd_order_detail="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_detail partition(dt='$do_date')
select
od.id,
od.order_id,
oi.user_id,
od.sku_id,
oi.province_id,
oda.activity_id,
oda.activity_rule_id,
odc.coupon_id,
od.create_time,
od.source_type,
od.source_id,
od.sku_num,
od.order_price*od.sku_num,
od.split_activity_amount,
od.split_coupon_amount,
od.split_final_amount
from
(
select
*
from ${APP}.ods_order_detail
where dt='$do_date'
)od
left join
(
select
id,
user_id,
province_id
from ${APP}.ods_order_info
where dt='$do_date'
)oi
on od.order_id=oi.id
left join
(
select
order_detail_id,
activity_id,
activity_rule_id
from ${APP}.ods_order_detail_activity
where dt='$do_date'
)oda
on od.id=oda.order_detail_id
left join
(
select
order_detail_id,
coupon_id
from ${APP}.ods_order_detail_coupon
where dt='$do_date'
)odc
on od.id=odc.order_detail_id;"
dwd_payment_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_payment_info partition(dt)
select
nvl(new.id,old.id),
nvl(new.order_id,old.order_id),
nvl(new.user_id,old.user_id),
nvl(new.province_id,old.province_id),
nvl(new.trade_no,old.trade_no),
nvl(new.out_trade_no,old.out_trade_no),
nvl(new.payment_type,old.payment_type),
nvl(new.payment_amount,old.payment_amount),
nvl(new.payment_status,old.payment_status),
nvl(new.create_time,old.create_time),
nvl(new.callback_time,old.callback_time),
nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(
select id,
order_id,
user_id,
province_id,
trade_no,
out_trade_no,
payment_type,
payment_amount,
payment_status,
create_time,
callback_time
from ${APP}.dwd_payment_info
where dt = '9999-99-99'
)old
full outer join
(
select
pi.id,
pi.out_trade_no,
pi.order_id,
pi.user_id,
oi.province_id,
pi.payment_type,
pi.trade_no,
pi.payment_amount,
pi.payment_status,
pi.create_time,
pi.callback_time
from
(
select * from ${APP}.ods_payment_info where dt='$do_date'
)pi
left join
(
select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on pi.order_id=oi.id
)new
on old.id=new.id;"
dwd_cart_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
select
id,
user_id,
sku_id,
source_type,
source_id,
cart_price,
is_ordered,
create_time,
operate_time,
order_time,
sku_num
from ${APP}.ods_cart_info
where dt='$do_date';"
dwd_comment_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_comment_info partition(dt='$do_date')
select
id,
user_id,
sku_id,
spu_id,
order_id,
appraise,
create_time
from ${APP}.ods_comment_info where dt='$do_date';"
dwd_favor_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
select
id,
user_id,
sku_id,
spu_id,
is_cancel,
create_time,
cancel_time
from ${APP}.ods_favor_info
where dt='$do_date';"
dwd_coupon_use="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_coupon_use partition(dt)
select
nvl(new.id,old.id),
nvl(new.coupon_id,old.coupon_id),
nvl(new.user_id,old.user_id),
nvl(new.order_id,old.order_id),
nvl(new.coupon_status,old.coupon_status),
nvl(new.get_time,old.get_time),
nvl(new.using_time,old.using_time),
nvl(new.used_time,old.used_time),
nvl(new.expire_time,old.expire_time),
coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
from
(
select
id,
coupon_id,
user_id,
order_id,
coupon_status,
get_time,
using_time,
used_time,
expire_time
from ${APP}.dwd_coupon_use
where dt='9999-99-99'
)old
full outer join
(
select
id,
coupon_id,
user_id,
order_id,
coupon_status,
get_time,
using_time,
used_time,
expire_time
from ${APP}.ods_coupon_use
where dt='$do_date'
)new
on old.id=new.id;"
dwd_order_refund_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_refund_info partition(dt='$do_date')
select
ri.id,
ri.user_id,
ri.order_id,
ri.sku_id,
oi.province_id,
ri.refund_type,
ri.refund_num,
ri.refund_amount,
ri.refund_reason_type,
ri.create_time
from
(
select * from ${APP}.ods_order_refund_info where dt='$do_date'
)ri
left join
(
select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on ri.order_id=oi.id;"
dwd_refund_payment="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_refund_payment partition(dt)
select
nvl(new.id,old.id),
nvl(new.user_id,old.user_id),
nvl(new.order_id,old.order_id),
nvl(new.sku_id,old.sku_id),
nvl(new.province_id,old.province_id),
nvl(new.trade_no,old.trade_no),
nvl(new.out_trade_no,old.out_trade_no),
nvl(new.payment_type,old.payment_type),
nvl(new.refund_amount,old.refund_amount),
nvl(new.refund_status,old.refund_status),
nvl(new.create_time,old.create_time),
nvl(new.callback_time,old.callback_time),
nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(
select
id,
user_id,
order_id,
sku_id,
province_id,
trade_no,
out_trade_no,
payment_type,
refund_amount,
refund_status,
create_time,
callback_time
from ${APP}.dwd_refund_payment
where dt='9999-99-99'
)old
full outer join
(
select
rp.id,
user_id,
order_id,
sku_id,
province_id,
trade_no,
out_trade_no,
payment_type,
refund_amount,
refund_status,
create_time,
callback_time
from
(
select
id,
out_trade_no,
order_id,
sku_id,
payment_type,
trade_no,
refund_amount,
refund_status,
create_time,
callback_time
from ${APP}.ods_refund_payment
where dt='$do_date'
)rp
left join
(
select
id,
user_id,
province_id
from ${APP}.ods_order_info
where dt='$do_date'
)oi
on rp.order_id=oi.id
)new
on old.id=new.id;"
case $1 in
dwd_order_info )
hive -e "$dwd_order_info"
clear_data dwd_order_info
;;
dwd_order_detail )
hive -e "$dwd_order_detail"
;;
dwd_payment_info )
hive -e "$dwd_payment_info"
clear_data dwd_payment_info
;;
dwd_cart_info )
hive -e "$dwd_cart_info"
;;
dwd_comment_info )
hive -e "$dwd_comment_info"
;;
dwd_favor_info )
hive -e "$dwd_favor_info"
;;
dwd_coupon_use )
hive -e "$dwd_coupon_use"
clear_data dwd_coupon_use
;;
dwd_order_refund_info )
hive -e "$dwd_order_refund_info"
;;
dwd_refund_payment )
hive -e "$dwd_refund_payment"
clear_data dwd_refund_payment
;;
all )
hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment"
clear_data dwd_order_info
clear_data dwd_payment_info
clear_data dwd_coupon_use
clear_data dwd_refund_payment
;;
esac
- 增加腳本執行權限
[xiaobai@hadoop102 bin]$ chmod 777 ods_to_dwd_db.sh
- 執行腳本
ods_to_dwd_db.sh all 2020-06-14
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/330201.html
標籤:其他
