1.資料庫操作

1.1 創建資料庫

1.2 查詢資料庫

1.3 切換資料庫

1.4 洗掉資料庫

2.表操作

2.1 創建內部表

2.2 創建外部表

2.3 內部表與外部表之間的轉換

2.4 加載資料

2.5 修改表

2.6 洗掉表

2.7 磁區表

2.8 分桶表

1.資料庫操作

1.1 創建資料庫

（1）創建一個名為school資料庫，資料庫在HDFS上的默認存盤路徑是/user/hive/warehouse/*.db

hive (default)> create database school;

(2) 避免要創建的資料庫已經存在錯誤，增加if not exists判斷

hive (default)> create database if not exists school_1;

（3）創建一個資料庫，指定資料庫在HDFS上存放的位置（注意：路徑要精確到資料庫名字）

hive (default)> create database if not exists school_2 location '/school.db';

1.2 查詢資料庫

（1）查詢資料庫

hive (default)> show databases;

（2）過濾顯示查詢的資料庫

hive (default)> show databases like "school*";
OK
database_name
school
school_1
school_2
Time taken: 0.036 seconds, Fetched: 3 row(s)

（3）顯示資料庫詳細資訊

hive (default)> desc database school;

1.3 切換資料庫

從當前資料庫default切換到school資料庫

hive (default)> use school;
OK
Time taken: 0.036 seconds
hive (school)>

1.4 洗掉資料庫

（1）洗掉空資料庫

hive (school)> drop database school_1;

（2）如果洗掉的資料庫不存在命令會報錯，最好采用if exists判斷資料庫是否存在

hive (school)> drop database if exists school_1;

（3）如果資料庫不為空，可以采用cascade命令，強制洗掉

hive (school)> drop database school_2 cascade;

2.表操作

創建表語法

create [external] table [if not exists] table_name [(col_name data_type[comment col_comment],...)]

[comment table_comment]

[partitioned by(col_namedata_type[comment col_comment],...)]

[clustered by(col_name,col_name,...)]

[stored as file_format]

[location hdfs_path]

欄位解釋說明

（1）create table創建一個指定名字的表，如果相同名字的表已經存在，則拋出例外；用戶可以用if not exists選項來忽略這個例外，

（2）external關鍵字可以讓用戶創建一個外部表，在建表的同時指定一個指向實際資料的路徑（location），Hive創建內部表時，會將資料移動到資料倉庫指向的路徑；若創建外部表，僅記錄資料所在的路徑，不對資料的位置做任何改變，在洗掉表的時候，內部表的元資料和資料會被一起洗掉，而外部表只洗掉元資料，不洗掉資料，

（3）comment：為表和列添加注釋，

（4）partitioned by：創建磁區表，

（5）clustered by：創建分桶表，

（6）stored as：指定存盤檔案型別，常用的存盤檔案型別：sequencefile（二進制序列檔案）、textfile（文本）、rcfile（列式存盤格式檔案）如果檔案資料是純文本，可以使用stored as textfile，如果資料需要壓縮，使用stored as sequence file，

（7）location：指定表在hdfs上的存盤位置，

（8）like：允許用戶復制現有的表結構，但是不復制資料

2.1 創建內部表

（1）創建內部表stu1，stu1表中包含有int型別欄位id、string型別欄位name，欄位之間的分隔符為制表符","

hive (school)> create table if not exists stu1(id int,name string) row format delimited fields terminated by ',';

（2）根據查詢結果創建表（查詢的結果會添加到新創建的表中）

hive (school)> create table if not exists stu2 as select id,name from stu1;

（3）根據已經存在的表結構創建表

hive (default)> create table if not exists stu3 like stu1;

（4）查詢表的詳細資訊

hive (school)> desc formatted stu1;

2.2 創建外部表

創建外部表tea，tea表中包含有int型別欄位id、string型別欄位name，欄位之間的分隔符為制表符","

hive (school)> create external table if not exists tea(id int,name string) row format delimited fields terminated by ',';

2.3 內部表與外部表之間的轉換

(1) 修改內部表stu1為外部表

hive (school)> alter table stu1 set tblproperties('EXTERNAL'='TRUE');

(2) 修改外部表tea為內部表

hive (school)> alter table tea set tblproperties('EXTERNAL'='FALSE');

2.4 加載資料

（1）將本地系統資料檔案/export/data/student.txt加載到表stu1

hive (school)> load data local inpath'/export/data/student.txt' into table stu1;

(2) 將HDFS上資料檔案 /teacher.txt加載到表tea

hive (school)> load data inpath'/teacher.txt' into table tea;

2.5 修改表

(1) 重命名

將表stu1重命名為stu

hive (school)> alter table stu1 rename to stu;

（2）添加列

給表stu添加int型別欄位class

hive (school)> alter table stu add columns(class int);

（3）更換列

將stu表中id欄位更換為string型別欄位number

hive (school)> alter table stu change column id number string;
OK
Time taken: 0.123 seconds
hive (school)> desc stu;
OK
col_name	data_type	comment
number              	string              	                    
name                	string              	                    
class               	int

（4）替換列

將stu表中欄位替換成int型別欄位id、string型別欄位name和int型別欄位class

hive (school)> alter table stu replace columns(id int,name string,class int);
OK
Time taken: 0.107 seconds
hive (school)> desc stu;
OK
col_name	data_type	comment
id                  	int                 	                    
name                	string              	                    
class               	int

2.6 洗掉表

（1）洗掉表

hive (school)> drop table tea;

（2）清除表中資料，只能清除內部表，不能清除外部表

hive (school)> truncate table stu;

2.7 磁區表

（1）創建磁區表student,表中包含有int型別欄位id、string型別欄位name，磁區欄位int型別欄位class，欄位之間的分隔符為制表符","

hive (school)> create table student(id int,name string) partitioned by(class int) row format delimited fields terminated by ',';

（2）添加磁區

添加單個磁區

alter table student add partition(class=1)；

添加多個磁區，磁區之間空格隔開

hive (school)> alter table student add partition(class=2) partition(class=3);

（3）查看表中磁區資訊

hive (school)> show partitions student;
OK
partition
class=1
class=2
class=3

（4）通過where子查詢加載對應磁區資料

hive (school)> insert into table student partition(class=1) select id,name from stu where class=1;

（5）查詢指定磁區

hive (school)> select * from student where class = 1;
OK
student.id	student.name	student.class
1	xiaoming	1
2	xiaohong	1
3	xiaogang	1

（6）洗掉磁區

洗掉單個磁區

hive (school)> alter table student drop partition(class=1);

洗掉多個磁區，磁區之間用逗號隔開

hive (school)> alter table student drop partition(class=2),partition(class=3);

2.8 分桶表

（1）開啟hive分桶功能

hive (school)> set hive.enforce.bucketing = true;

（2）創建磁區表stu——cluster,表中包含有int型別欄位id、string型別欄位name和int型別欄位class，分桶欄位為class，分為3個桶，欄位之間的分隔符為制表符","

hive (school)> create table stu_cluster(id int,name string,class int) clustered by(class) into 3 buckets row format delimited fields terminated by ',';

（3）通過中間表對資料進行磁區

hive (school)> insert overwrite table stu_cluster select * from stu cluster by(class);

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/317840.html

標籤：其他

上一篇：ElasticSearch在分布式專案中的使用

下一篇：Pandas高級資料分析快速入門——全程序綜述及案例集錦

hive資料庫及表操作

1.資料庫操作

1.1 創建資料庫

1.2 查詢資料庫

1.3 切換資料庫

1.4 洗掉資料庫

2.表操作

2.1 創建內部表

2.2 創建外部表

2.3 內部表與外部表之間的轉換

2.4 加載資料

2.5 修改表

2.6 洗掉表

2.7 磁區表

2.8 分桶表