文章目錄
- 1、解壓 Sqoop 安裝包
- 2、配置 sqoop-env.sh
- 3、配置系統環境變數
- 4、sqoop 連接 MySQL
- 5、sqoop 連接 MySQL 匯入到 Hive
- 5.1 準備
- 5.2 MySQL創建模擬資料表(資料源)
- 5.3 Hive 創建資料表(接收)
- 5.3 MySQL 匯出到 Hive
準備環境:
- Hadoop 完全分布式集群環境
- Sqoop 安裝包:http://archive.apache.org/dist/sqoop/
1、解壓 Sqoop 安裝包
上傳本地安裝包:

解壓后并重命名:

回傳頂部
2、配置 sqoop-env.sh
復制 sqoop-env-template.sh 模板,并將模板重命名為 sqoop-env.sh,在 sqoop-env.sh 中添加 Hdoop、Hbase、Hive、HBase的安裝路徑:

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/src/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
#set the path to where bin/hbase is available
export HBASE_HOME=/usr/local/src/hbase
#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/src/hive
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/usr/local/src/zookeeper/conf
回傳頂部
3、配置系統環境變數
配置 Linux 系統環境變數(/etc/profile),添加 Sqoop 組件的路徑:
# set sqoop environment
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
保存退出后 source 使其生效!
回傳頂部
4、sqoop 連接 MySQL
為了使 Sqoop 能夠連接 MySQL 資料庫,需要將 /../download/mysql-connector-java-5.1.46.jar 檔案放入 sqoop 的 lib 目錄中:
[root@server download]# cp mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
執行 Sqoop 前需要先啟動 Hadoop 集群,在 server 節點執行 start-all.sh 命令啟動 Hadoop 集群,查看行程:
[root@server conf]# jps
5392 HMaster
1762 ResourceManager
4754 QuorumPeerMain
10226 Jps
1603 SecondaryNameNode
1412 NameNode
[root@agent1 src]# jps
1235 DataNode
1863 HRegionServer
1306 NodeManager
1563 QuorumPeerMain
2687 Jps
[root@agent2 src]# jps
1344 NodeManager
1570 QuorumPeerMain
1238 DataNode
2678 Jps
1868 HRegionServer
測驗 Sqoop 是否能夠正常連接 MySQL 資料庫:
[root@server download]# cp mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
[root@server download]# sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/02/25 19:35:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
22/02/25 19:35:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri Feb 25 19:35:46 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sys
能 夠 查 看 到 MySQL 數 據 庫 中 的 information_schema 、 hive 、 mysql 、performance_schema、sys 等資料庫,說明 Sqoop 可以正常連接 MySQL,
回傳頂部
5、sqoop 連接 MySQL 匯入到 Hive
5.1 準備
為了使 Sqoop 能夠連接 Hive,需要將 hive 組件 /usr/local/src/hive/lib 目錄下的 hive-common-2.3.9.jar 也放入 Sqoop 安裝路徑的 lib 目錄中:
[root@server ~] cp /usr/local/src/hive/lib/hive-common-2.3.9.jar /usr/local/src/sqoop/lib/
5.2 MySQL創建模擬資料表(資料源)
MySQL中創建 test 資料庫,在 test 中創建 student 表,在 student 表中插入了 3 條資料:
[root@server lib]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.7.22 MySQL Community Server (GPL)
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create database test;
Query OK, 1 row affected (0.00 sec)
mysql> use test;
Database changed
mysql> create table student(number char(9) primary key, name varchar(10));
Query OK, 0 rows affected (0.25 sec)
mysql> insert into student values('01','zhangsan');
Query OK, 1 row affected (0.00 sec)
mysql> insert into student values('02','lisi');
Query OK, 1 row affected (0.00 sec)
mysql> insert into student values('03','wangwu');
Query OK, 1 row affected (0.00 sec)
mysql> select * from student;
+--------+----------+
| number | name |
+--------+----------+
| 01 | zhangsan |
| 02 | lisi |
| 03 | wangwu |
+--------+----------+
3 rows in set (0.00 sec)
5.3 Hive 創建資料表(接收)
在 Hive 中創建 test 資料庫和 student 資料表:
hive> create database test;
.................
OK
Time taken: 6.211 seconds
hive> use test;
OK
Time taken: 0.018 seconds
hive> create table student(number STRING, name STRING)
> row format delimited
> fields terminated by "|"
> stored as textfile;
OK
Time taken: 9.939 seconds
5.3 MySQL 匯出到 Hive
需要說明該命令的以下幾個引數:
- –connect:MySQL 資料庫連接 URL,
- –username & –password:MySQL 資料庫的用戶名和密碼,
- –table:匯出的資料表名,
- –fields-terminated-by:Hive 中欄位分隔符,
- –delete-target-dir:洗掉匯出目的目錄,
- –num-mappers:Hadoop 執行 Sqoop 匯入匯出啟動的 map 任務數,
- –hive-import --hive-database:匯出到 Hive 的資料庫名,
- –hive-table:匯出到 Hive 的表名,
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://server:3306/test --username root --password 123456 --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database test --hive-table student
..........
22/02/25 20:02:10 INFO hive.HiveImport: OK
22/02/25 20:02:10 INFO hive.HiveImport: Time taken: 10.6 seconds
22/02/25 20:02:13 INFO hive.HiveImport: Loading data to table test.student
22/02/25 20:02:14 INFO hive.HiveImport: OK
22/02/25 20:02:14 INFO hive.HiveImport: Time taken: 3.846 seconds
22/02/25 20:02:15 INFO hive.HiveImport: Hive import complete.
22/02/25 20:02:15 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
查看Hive表中的資料:
hive> show tables;
OK
student
Time taken: 0.409 seconds, Fetched: 1 row(s)
hive> select * from student;
OK
01 zhangsan
02 lisi
03 wangwu
Time taken: 1.371 seconds, Fetched: 3 row(s)
回傳頂部
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/433248.html
標籤:其他
