【使用分享】Hive磁區表那些事-有解無憂

一、靜態磁區

1.創建靜態磁區格式:

create table employees (
 name   string,
 salary  float,
 subordinated array<string>,
 deductions map<string,float>,
 address  struct<street:string,city:string,state:string,zip:int>
 ) partitioned by (country string,state string)
 row format delimited
 fields terminated by "\t"
 collection items terminated by ","
 map keys terminated by ":"
 lines terminated by "\n"
 stored as textfile;

創建成果后發現他的存盤路徑和普通的內部表的路徑是一樣的而且多了磁區表的欄位，因為我們創建的磁區表并沒內容，事實上，除非需要優化查詢性能，否則實作表的用戶不需要關系"欄位是否是磁區欄位"

2.添加磁區表
alter table employees add partition (country="china",state="Asia");
查看磁區表資訊: show partitions employees;
hdfs上的路徑:/user/hive/warehouse/zxz.db/employees/country=china/state=Asia 他們都是以目錄及子目錄形式存盤的

3.插入資料:
格式:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row …];
格式2：（推薦使用）
load data local inpath '/home/had/data1.txt' into table employees partition (country =china,state=Asia)

4.利用磁區表查詢：(一般磁區表都是利用where陳述句查詢的)

5.CTAS陳述句和like
創建表，攜帶資料
create table employees1 as select * from employees1
創建表，攜帶表結構
create table employees2 like employees

6.外部磁區表:
外部表同樣可以使用磁區，事實上，用戶會發現，只是管理大型生產資料集最常見的情況，這種結合給用戶提供一個和其他工具共享資料的方式，同時也可以優化查詢性能

  create external table employees_ex(
   name   string,
   salary  float,
   subordinated array<string>,
   deductions map<string,float>,
   address  struct<street:string,city:string,state:string,zip:int>
   ) partitioned by (country string,state string)
   row format delimited
   fields terminated by "\t"
   collection items terminated by ","
   map keys terminated by ":"
   lines terminated by "\n"
   stored as textfile;  
   location "/user/had/data/"    //他其實和普通的靜態磁區表一樣就是多了一個external關鍵字
    這樣我們就可以把資料路徑改變而不影響資料的丟失，這是內部磁區表遠遠不能做的事情:

(因為我們創建的是外部表)所有我們可以把表資料放到hdfs上的隨便一個地方這里自動資料加載到/user/had/data/下(當然我們之前在外部表上指定了路徑)
load data local inpath '/home/had/data.txt' into table employees_ex partition (country="china",state="Asia");

如果我們加載的資料要分離一些舊資料的時候就可以hadoop的distcp命令來copy資料到某個路徑
hadoop distcp /user/had/data/country=china/state=Asia /user/had/data_old/country=china/state=Asia

修改表，把移走的資料的路徑在hive里修改
alter table employees partition(country="china",state="Asia") set location '/user/had/data_old/country=china/state=Asia'

使用hdfs的rm命令洗掉之前路徑的資料
hdfs dfs -rmr /user/had/data/country=china/state=Asia

如果覺得突然忘記了資料的位置使用使用下面的方式查看
describe extend employees_ex partition (country="china",state="Asia");

7.洗掉磁區表
alter table employees drop partition(country="china",state="Asia");

8.眾多的修改陳述句
把一個磁區打包成一個har包
alter table employees archive partition (country="china",state="Asia")

把一個磁區har包還原成原來的磁區
alter table employees unarchive partition (country="china",state="Asia")

保護磁區防止被洗掉
alter table employees partition (country="china",state="Asia") enable no_drop

保護磁區防止被查詢
alter table employees partition (country="china",state="Asia") enable offline

允許磁區洗掉和查詢
alter table employees partition (country="china",state="Asia") disable no_drop
alter table employees partition (country="china",state="Asia") disable offline

9.通過查詢陳述句向表中插入資料
insert overwrite/into table copy_employees partition （country="china",state="Asia"） select * from employees es where es.country="china" and es.state ="Asia"

二、動態磁區:

為什么要使用動態磁區呢，我們舉個例子，假如中國有50個省，每個省有50個市，每個市都有100個區，那我們都要使用靜態磁區要使用多久才能搞完，所有我們要使用動態磁區，
動態磁區默認是沒有開啟，開啟后默認是以嚴格模式執行的，在這種模式下需要至少一個磁區欄位是靜態的，這有助于阻止因設計錯誤導致導致查詢差生大量的磁區，列如：用戶可能錯誤使用時間戳作為磁區表欄位，然后導致每秒都對應一個磁區！這樣我們也可以采用相應的措施:
關閉嚴格磁區模式
動態磁區模式時是嚴格模式，也就是至少有一個靜態磁區，
set hive.exec.dynamic.partition.mode=nonstrict //磁區模式，默認nostrict
set hive.exec.dynamic.partition=true //開啟動態磁區,默認true
set hive.exec.max.dynamic.partitions=1000 //最大動態磁區數,默認1000

1,創建一個普通動態磁區表:

create table if not exists  zxz_5(
 name string,
 nid int,
 phone string,
 ntime date
 ) partitioned by (year int,month int)
 row format delimited
 fields terminated by "|"
 lines terminated by "\n"
 stored as textfile;

2.動態磁區表入資料:注意插入資料的列名需定義測和磁區欄位名相同
insert overwrite table zxz_5 partition (year,month) select name,nid,phone,ntime,year(ntime) as year ,month(ntime) as month from zxz_dy;

本文由華為云發布，

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/444384.html

標籤：其他

上一篇：SuperEdge: 使用WebAssembly擴展邊緣計算場景

下一篇：機器學習—聚類5-1（K-Means演算法+瑞士卷）