主頁 > 軟體設計 > 大資料開發之Hive篇20-Hive的高級特性

大資料開發之Hive篇20-Hive的高級特性

2021-01-19 11:44:21 軟體設計

備注:
Hive 版本 2.1.1

文章目錄

  • 一.Hive ACID and Transactions
  • 二.Hive on Tez
  • 三. Hive on Spark
  • 四.HCatalog
  • 參考

這個blog介紹Hive的高級特性
1) Hive ACID and Transactions
2) Hive on Tez
3) Hive on Spark
4) HCatalog

一.Hive ACID and Transactions

Hive 0.14版本開始支持ACID
歷史版本:

  1. 一次寫入,多次分析查詢的場景(HDFS不可行級別更新)
  2. 僅支持表或者partition級別的insert overwrite全量重寫
  3. 不支持行級別的更新或洗掉
    ACID 支持:
  4. INSERT INTO … VALUES (…) ,(…) …
  5. UPDATE … SET xxx=xxx WHERE …
  6. DELETE FROM … WHERE …

前提條件:

  1. Hive 0.14版本以上
  2. 目前僅支持ORC格式
  3. 表必須分桶且不能sort
  4. 表必須顯式宣告transactional=true

配置(hive-site.xml):

hive.support.concurrency=true
hive.enforce.bucketing=true
hive.exec.dynamic.partition.mode=nonstrict
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on=true
hive.compactor.worker.threads=2

hive 事務表測驗

hive> CREATE TABLE test_transaction(id int, name string) CLUSTERED BY (id) INTO 2 BUCKETS
    > STORED AS ORC TBLPROPERTIES ('transactional'='true');
OK
Time taken: 1.661 seconds
hive> INSERT INTO test_transaction VALUES (1, 'John') ,(2,'Lily'),(3, 'Tom');
Query ID = root_20201224175758_cc903192-1091-4893-8f0c-a1448a1c737b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/24 17:57:59 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1608780340033_0008, Tracking URL = http://hp3:8088/proxy/application_1608780340033_0008/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1608780340033_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2020-12-24 17:58:07,399 Stage-1 map = 0%,  reduce = 0%
2020-12-24 17:58:13,627 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.94 sec
2020-12-24 17:58:19,822 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 6.65 sec
2020-12-24 17:58:20,852 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.26 sec
MapReduce Total cumulative CPU time: 10 seconds 260 msec
Ended Job = job_1608780340033_0008
Loading data to table test.test_transaction
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 2   Cumulative CPU: 10.26 sec   HDFS Read: 12284 HDFS Write: 1438 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 260 msec
OK
Time taken: 24.358 seconds
hive> UPDATE test_transaction SET name='Richard'WHERE id=2;
Query ID = root_20201224175824_91acdbe9-5966-489c-beab-67b374fc6911
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/24 17:58:25 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1608780340033_0009, Tracking URL = http://hp3:8088/proxy/application_1608780340033_0009/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1608780340033_0009
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 2
2020-12-24 17:58:32,217 Stage-1 map = 0%,  reduce = 0%
2020-12-24 17:58:39,426 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.1 sec
2020-12-24 17:58:45,616 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 11.71 sec
MapReduce Total cumulative CPU time: 11 seconds 710 msec
Ended Job = job_1608780340033_0009
Loading data to table test.test_transaction
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2  Reduce: 2   Cumulative CPU: 11.71 sec   HDFS Read: 21436 HDFS Write: 770 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 710 msec
OK
Time taken: 23.475 seconds
hive> 
    > DELETE FROM test_transaction WHERE id=3;
Query ID = root_20201224175855_61fa8aaa-9db7-4f8d-87e9-31a0b4da835b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/24 17:58:55 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1608780340033_0010, Tracking URL = http://hp3:8088/proxy/application_1608780340033_0010/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1608780340033_0010
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 2
2020-12-24 17:59:01,750 Stage-1 map = 0%,  reduce = 0%
2020-12-24 17:59:09,985 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.04 sec
2020-12-24 17:59:16,140 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 11.83 sec
MapReduce Total cumulative CPU time: 11 seconds 830 msec
Ended Job = job_1608780340033_0010
Loading data to table test.test_transaction
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2  Reduce: 2   Cumulative CPU: 11.83 sec   HDFS Read: 21690 HDFS Write: 641 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 830 msec
OK
Time taken: 22.25 seconds
hive> 

二.Hive on Tez

Hive 支持多引擎
image.png

Tez是一種支持DAG作業的開源計算框架,它可以將多個有依賴的作業轉換為一個作業從而大幅提升DAG作業的性能
image.png

CDH版本不支持 Hive on Tez,此處略過

三. Hive on Spark

Spark發源于美國加州大學伯克利分校AMPLab實驗室,2010年貢獻給Apache
克服了MapReduce在迭代計算和互動式計算方面的不足,引入RDD的資料模型
相對于MapReduce,充分利用記憶體,獲得更高的計算效率

Hive on Spark:
1) 利用Hive的架構不變,引入Spark執行引擎,提供用戶選擇(mr/tez/spark),提升計算效率
2) 官方檔案:https://issues.apache.org/jira/browse/HIVE-7292

MapReduce
image.png

Spark
image.png

優勢:

  1. HQL不需要做任何變動,無縫的提供了另一種執行引擎支持
  2. 有利于與Spark的其他模塊如Mllib/Spark Streaming/GragphX等結合
  3. 提升了執行效率

如何使用?
set hive.execution.engine=spark; //使用Spark作為執行引擎
Spark Job優化:

spark.master默認提交到YARN
spark.executor.memory
spark.executor.cores
spark.yarn.executor.memoryOverhead
spark.executor.instances

Hive on Spark 運行效率是mr的10倍

-- mr執行
hive> 
    > select count(*) from ods_fact_sale;
Query ID = root_20201218100909_81d39c2b-0da0-40a1-8988-790040e4e3e1
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1606698967173_0319, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0319/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0319
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 1
2020-12-18 10:09:18,279 Stage-1 map = 0%,  reduce = 0%
2020-12-18 10:09:27,606 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 12.98 sec
2020-12-18 10:09:33,807 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 24.95 sec
2020-12-18 10:09:39,986 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 36.98 sec
2020-12-18 10:09:47,189 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 48.99 sec
2020-12-18 10:09:52,333 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 54.86 sec
2020-12-18 10:09:53,363 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 60.81 sec
2020-12-18 10:09:59,541 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 72.62 sec
2020-12-18 10:10:04,686 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 78.52 sec
2020-12-18 10:10:05,716 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 84.34 sec
2020-12-18 10:10:10,876 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 90.17 sec
2020-12-18 10:10:11,908 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 95.94 sec
2020-12-18 10:10:17,039 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 101.68 sec
2020-12-18 10:10:23,196 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 113.41 sec
2020-12-18 10:10:24,222 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 119.28 sec
2020-12-18 10:10:28,330 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 125.11 sec
2020-12-18 10:10:29,359 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 130.83 sec
2020-12-18 10:10:34,481 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 136.61 sec
2020-12-18 10:10:35,508 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 142.41 sec
2020-12-18 10:10:41,665 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 154.2 sec
2020-12-18 10:10:46,799 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 159.98 sec
2020-12-18 10:10:47,822 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 165.81 sec
2020-12-18 10:10:52,947 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 171.61 sec
2020-12-18 10:10:53,974 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 177.25 sec
2020-12-18 10:11:00,121 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 188.91 sec
2020-12-18 10:11:05,240 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 194.73 sec
2020-12-18 10:11:06,265 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 200.57 sec
2020-12-18 10:11:11,391 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 212.26 sec
2020-12-18 10:11:16,511 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 218.19 sec
2020-12-18 10:11:22,652 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 229.83 sec
2020-12-18 10:11:23,676 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 235.64 sec
2020-12-18 10:11:28,795 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 241.55 sec
2020-12-18 10:11:29,819 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 247.31 sec
2020-12-18 10:11:34,933 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 253.12 sec
2020-12-18 10:11:36,979 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 258.87 sec
2020-12-18 10:11:43,128 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 270.73 sec
2020-12-18 10:11:47,224 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 276.6 sec
2020-12-18 10:11:50,294 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 282.38 sec
2020-12-18 10:11:54,390 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 288.34 sec
2020-12-18 10:11:56,437 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 294.18 sec
2020-12-18 10:12:00,533 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 300.09 sec
2020-12-18 10:12:05,655 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 311.75 sec
2020-12-18 10:12:07,703 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 317.38 sec
2020-12-18 10:12:11,806 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 323.19 sec
2020-12-18 10:12:13,854 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 328.94 sec
2020-12-18 10:12:17,946 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 334.82 sec
2020-12-18 10:12:19,994 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 340.55 sec
2020-12-18 10:12:26,146 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 351.6 sec
2020-12-18 10:12:29,221 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 357.31 sec
2020-12-18 10:12:33,314 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 363.32 sec
2020-12-18 10:12:35,363 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 369.29 sec
2020-12-18 10:12:39,454 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 375.2 sec
2020-12-18 10:12:41,494 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 381.08 sec
2020-12-18 10:12:47,644 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 392.97 sec
2020-12-18 10:12:50,711 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 398.84 sec
2020-12-18 10:12:53,784 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 404.66 sec
2020-12-18 10:12:56,848 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 410.55 sec
2020-12-18 10:12:58,898 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 416.3 sec
2020-12-18 10:13:02,994 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 422.16 sec
2020-12-18 10:13:09,141 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 434.12 sec
2020-12-18 10:13:11,194 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 440.01 sec
2020-12-18 10:13:15,299 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 445.82 sec
2020-12-18 10:13:17,351 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 451.66 sec
2020-12-18 10:13:21,451 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 457.44 sec
2020-12-18 10:13:23,504 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 463.24 sec
2020-12-18 10:13:29,652 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 474.77 sec
2020-12-18 10:13:33,752 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 480.62 sec
2020-12-18 10:13:34,775 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 486.3 sec
2020-12-18 10:13:39,912 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 492.12 sec
2020-12-18 10:13:41,966 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 498.08 sec
2020-12-18 10:13:45,043 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 503.87 sec
2020-12-18 10:13:51,204 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 515.12 sec
2020-12-18 10:13:54,282 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 520.94 sec
2020-12-18 10:13:57,358 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 526.82 sec
2020-12-18 10:14:00,430 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 532.64 sec
2020-12-18 10:14:03,509 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 538.41 sec
2020-12-18 10:14:09,659 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 549.96 sec
2020-12-18 10:14:12,735 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 555.69 sec
2020-12-18 10:14:15,817 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 561.59 sec
2020-12-18 10:14:20,942 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 567.48 sec
2020-12-18 10:14:27,127 Stage-1 map = 84%,  reduce = 28%, Cumulative CPU 573.97 sec
2020-12-18 10:14:33,274 Stage-1 map = 85%,  reduce = 28%, Cumulative CPU 579.95 sec
2020-12-18 10:14:45,558 Stage-1 map = 86%,  reduce = 28%, Cumulative CPU 591.63 sec
2020-12-18 10:14:50,679 Stage-1 map = 86%,  reduce = 29%, Cumulative CPU 591.72 sec
2020-12-18 10:14:51,704 Stage-1 map = 87%,  reduce = 29%, Cumulative CPU 597.42 sec
2020-12-18 10:14:57,850 Stage-1 map = 88%,  reduce = 29%, Cumulative CPU 603.15 sec
2020-12-18 10:15:04,001 Stage-1 map = 89%,  reduce = 29%, Cumulative CPU 609.1 sec
2020-12-18 10:15:09,118 Stage-1 map = 90%,  reduce = 30%, Cumulative CPU 614.94 sec
2020-12-18 10:15:15,266 Stage-1 map = 91%,  reduce = 30%, Cumulative CPU 620.71 sec
2020-12-18 10:15:27,555 Stage-1 map = 92%,  reduce = 30%, Cumulative CPU 632.32 sec
2020-12-18 10:15:33,697 Stage-1 map = 93%,  reduce = 31%, Cumulative CPU 638.27 sec
2020-12-18 10:15:40,849 Stage-1 map = 94%,  reduce = 31%, Cumulative CPU 644.15 sec
2020-12-18 10:15:46,996 Stage-1 map = 95%,  reduce = 31%, Cumulative CPU 650.08 sec
2020-12-18 10:15:51,090 Stage-1 map = 95%,  reduce = 32%, Cumulative CPU 650.12 sec
2020-12-18 10:15:52,109 Stage-1 map = 96%,  reduce = 32%, Cumulative CPU 655.98 sec
2020-12-18 10:15:58,259 Stage-1 map = 97%,  reduce = 32%, Cumulative CPU 661.66 sec
2020-12-18 10:16:10,547 Stage-1 map = 98%,  reduce = 32%, Cumulative CPU 673.09 sec
2020-12-18 10:16:15,660 Stage-1 map = 98%,  reduce = 33%, Cumulative CPU 673.15 sec
2020-12-18 10:16:16,685 Stage-1 map = 99%,  reduce = 33%, Cumulative CPU 678.98 sec
2020-12-18 10:16:22,828 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 684.83 sec
2020-12-18 10:16:23,855 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 686.77 sec
MapReduce Total cumulative CPU time: 11 minutes 26 seconds 770 msec
Ended Job = job_1606698967173_0319
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117  Reduce: 1   Cumulative CPU: 686.77 sec   HDFS Read: 31436878698 HDFS Write: 109 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 11 minutes 26 seconds 770 msec
OK
767830000
Time taken: 435.425 seconds, Fetched: 1 row(s)
hive> exit;

-- 調整執行引擎為 spark執行
hive> select count(*) from ods_fact_sale;
Query ID = root_20201218102616_475f2d81-1430-4ad4-83c9-8f447a66476a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1606698967173_0320
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1606698967173_0320
Hive on Spark Session Web UI URL: http://hp3:39738

Query Hive on Spark job[0] stages: [0, 1]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED    117        117        0        0       0  
Stage-1 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 50.34 s    
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 50.34 second(s)
Spark Job[0] Metrics: TaskDurationTime: 306308, ExecutorCpuTime: 239414, JvmGCTime: 5046, BytesRead / RecordsRead: 31436886423 / 767830000, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 6669 / 117, ShuffleBytesWritten / ShuffleRecordsWritten: 6669 / 117
OK
767830000
Time taken: 68.884 seconds, Fetched: 1 row(s)
hive> 

四.HCatalog

HCatalog是Hadoop的元資料和資料表的管理系統,它基于Hive中的元資料層,通過類似SQL的語言展現Hadoop資料的關聯關系,

HCatalog允許用戶通過Hive,Pig,MapReduce共享資料和元資料,在用戶撰寫應用程式時,無需關心資料怎么存盤,在哪里存盤,避免用戶因schema和存盤格式的改變而受到影響,

通過HCatalog,用戶能夠通過工具訪問Hadoop上的Hive metastore,它為MapReduce和Pig提供了連接器,用戶可以使用工具對Hive的關聯列格式的資料進行讀寫,

HCatalog架構圖:
image.png

image.png

參考

1.https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started (hive on spark)
2.https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_hive_configure.html#xd_583c10bfdbd326ba-590cb1d1-149e9ca9886–7b23 (Cloudera 上調整執行引擎為spark)

轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/250659.html

標籤:其他

上一篇:大資料計算系統學習筆記

下一篇:zookeeper 集群擴容方案 - 資料準確,停機擴容

標籤雲
其他(157675) Python(38076) JavaScript(25376) Java(17977) C(15215) 區塊鏈(8255) C#(7972) AI(7469) 爪哇(7425) MySQL(7132) html(6777) 基礎類(6313) sql(6102) 熊猫(6058) PHP(5869) 数组(5741) R(5409) Linux(5327) 反应(5209) 腳本語言(PerlPython)(5129) 非技術區(4971) Android(4554) 数据框(4311) css(4259) 节点.js(4032) C語言(3288) json(3245) 列表(3129) 扑(3119) C++語言(3117) 安卓(2998) 打字稿(2995) VBA(2789) Java相關(2746) 疑難問題(2699) 细绳(2522) 單片機工控(2479) iOS(2429) ASP.NET(2402) MongoDB(2323) 麻木的(2285) 正则表达式(2254) 字典(2211) 循环(2198) 迅速(2185) 擅长(2169) 镖(2155) 功能(1967) .NET技术(1958) Web開發(1951) python-3.x(1918) HtmlCss(1915) 弹簧靴(1913) C++(1909) xml(1889) PostgreSQL(1872) .NETCore(1853) 谷歌表格(1846) Unity3D(1843) for循环(1842)

熱門瀏覽
  • 面試突擊第一季,第二季,第三季

    第一季必考 https://www.bilibili.com/video/BV1FE411y79Y?from=search&seid=15921726601957489746 第二季分布式 https://www.bilibili.com/video/BV13f4y127ee/?spm_id_fro ......

    uj5u.com 2020-09-10 05:35:24 more
  • 第三單元作業總結

    1.前言 這應該是本學期最后一次寫作業總結了吧。總體來說,對作業的節奏也差不多掌握了,作業做起來的效率也更高了。雖然和之前的作業一樣,作業中都要用到新的知識,但是相比之前,更加懂得了如何利用工具以及資料。雖然之間卡過殼,但總體而言,這幾次作業還算完成的比較好。 2.作業程序總結 相比前兩個單元,此單 ......

    uj5u.com 2020-09-10 05:35:41 more
  • 北航OO(2020)第四單元博客作業暨課程總結博客

    北航OO(2020)第四單元博客作業暨課程總結博客 本單元作業的架構設計 在本單元中,由于UML圖具有比較清晰的樹形結構,因此我對其中需要進行查詢操作的元素進行了包裝,在樹的父節點中存盤所有孩子的參考。考慮到性能問題,我采用了快取機制,一次查詢后盡可能快取已經遍歷過的資訊,以減少遍歷次數。 本單元我 ......

    uj5u.com 2020-09-10 05:35:48 more
  • BUAA_OO_第四單元

    一、UML決議器設計 ? 先看下題目:第四單元實作一個基于JDK 8帶有效性檢查的UML(Unified Modeling Language)類圖,順序圖,狀態圖分析器 MyUmlInteraction,實際上我們要建立一個有向圖模型,UML中的物件(元素)可能與同級元素連接,也可與低級元素相連形成 ......

    uj5u.com 2020-09-10 05:35:54 more
  • 6.1邏輯運算子

    邏輯運算子 1. && 短路與 運算式1 && 運算式2 01.運算式1為true并且運算式2也為true 整體回傳為true 02.運算式1為false,將不會執行運算式2 整體回傳為false 03.只要有一個運算式為false 整體回傳為false 2. || 短路或 運算式1 || 運算式2 ......

    uj5u.com 2020-09-10 05:35:56 more
  • BUAAOO 第四單元 & 課程總結

    1. 第四單元:StarUml檔案決議 本單元采用了圖模型決議UML。 UML檔案可以抽象為圖、子圖、邊的邏輯結構。 在實作中,圖的節點包括類、介面、屬性,子圖包括狀態圖、順序圖等。 采用了三次遍歷UML元素的方法建圖,第一遍遍歷建點,第二、三次遍歷設定屬性、連邊,實作圖物件的初始化。這里借鑒了一些 ......

    uj5u.com 2020-09-10 05:36:06 more
  • 談談我對C# 多型的理解

    面向物件三要素:封裝、繼承、多型。 封裝和繼承,這兩個比較好理解,但要理解多型的話,可就稍微有點難度了。今天,我們就來講講多型的理解。 我們應該經常會看到面試題目:請談談對多型的理解。 其實呢,多型非常簡單,就一句話:呼叫同一種方法產生了不同的結果。 具體實作方式有三種。 一、多載 多載很簡單。 p ......

    uj5u.com 2020-09-10 05:36:09 more
  • Python 資料驅動工具:DDT

    背景 python 的unittest 沒有自帶資料驅動功能。 所以如果使用unittest,同時又想使用資料驅動,那么就可以使用DDT來完成。 DDT是 “Data-Driven Tests”的縮寫。 資料:http://ddt.readthedocs.io/en/latest/ 使用方法 dd. ......

    uj5u.com 2020-09-10 05:36:13 more
  • Python里面的xlrd模塊詳解

    那我就一下面積個問題對xlrd模塊進行學習一下: 1.什么是xlrd模塊? 2.為什么使用xlrd模塊? 3.怎樣使用xlrd模塊? 1.什么是xlrd模塊? ?python操作excel主要用到xlrd和xlwt這兩個庫,即xlrd是讀excel,xlwt是寫excel的庫。 今天就先來說一下xl ......

    uj5u.com 2020-09-10 05:36:28 more
  • 當我們創建HashMap時,底層到底做了什么?

    jdk1.7中的底層實作程序(底層基于陣列+鏈表) 在我們new HashMap()時,底層創建了默認長度為16的一維陣列Entry[ ] table。當我們呼叫map.put(key1,value1)方法向HashMap里添加資料的時候: 首先,呼叫key1所在類的hashCode()計算key1 ......

    uj5u.com 2020-09-10 05:36:38 more
最新发布
  • 【中介者設計模式詳解】C/Java/JS/Go/Python/TS不同語言實作

    * 中介者模式是一種行為型設計模式,它可以用來減少類之間的直接依賴關系,
    * 將物件之間的通信封裝到一個中介者物件中,從而使得各個物件之間的關系更加松散。
    * 在中介者模式中,物件之間不再直接相互互動,而是通過中介者來中轉訊息。 ......

    uj5u.com 2023-04-20 08:20:47 more
  • 露天煤礦現場調研和交流案例分享

    他們集團的資訊化公司及研究院在一個礦區正在做智能礦山的統一平臺的 試點,專案投資大概1億,包括了礦山的各方面的內容,顯示得我們這次交流有點多余。他們2年前開始做智能礦山的規劃,有很多煤礦行業專家的加持,他們的描述是非常完美,但是去年底應該上線的平臺,現在還沒有看到影子。他們確實有很多場景需求,但是被... ......

    uj5u.com 2023-04-20 08:20:25 more
  • 《社區人員管理》實戰案例設計&個人案例分享

    設計是一個讓人夢想成真程序,開始編碼、測驗、除錯之前進行需求分析和架構設計,才能保證關鍵方面都做正確 ......

    uj5u.com 2023-04-20 08:20:17 more
  • 軟體架構生態化-多角色交付的探索實踐

    作為一個技術架構師,不僅僅要緊跟行業技術趨勢,還要結合研發團隊現狀及痛點,探索新的交付方案。在日常中,你是否遇到如下問題 “ 業務需求排期長研發是瓶頸;非研發角色感受不到研發技改提效的變化;引入ISV 團隊又擔心質量和安全,培訓周期長“等等,基于此我們探索了一種新的技術體系及交付方案來解決如上問題。 ......

    uj5u.com 2023-04-20 08:20:10 more
  • 【中介者設計模式詳解】C/Java/JS/Go/Python/TS不同語言實作

    * 中介者模式是一種行為型設計模式,它可以用來減少類之間的直接依賴關系,
    * 將物件之間的通信封裝到一個中介者物件中,從而使得各個物件之間的關系更加松散。
    * 在中介者模式中,物件之間不再直接相互互動,而是通過中介者來中轉訊息。 ......

    uj5u.com 2023-04-20 08:19:44 more
  • 露天煤礦現場調研和交流案例分享

    他們集團的資訊化公司及研究院在一個礦區正在做智能礦山的統一平臺的 試點,專案投資大概1億,包括了礦山的各方面的內容,顯示得我們這次交流有點多余。他們2年前開始做智能礦山的規劃,有很多煤礦行業專家的加持,他們的描述是非常完美,但是去年底應該上線的平臺,現在還沒有看到影子。他們確實有很多場景需求,但是被... ......

    uj5u.com 2023-04-20 08:19:07 more
  • 《社區人員管理》實戰案例設計&個人案例分享

    設計是一個讓人夢想成真程序,開始編碼、測驗、除錯之前進行需求分析和架構設計,才能保證關鍵方面都做正確 ......

    uj5u.com 2023-04-20 08:18:57 more
  • 軟體架構生態化-多角色交付的探索實踐

    作為一個技術架構師,不僅僅要緊跟行業技術趨勢,還要結合研發團隊現狀及痛點,探索新的交付方案。在日常中,你是否遇到如下問題 “ 業務需求排期長研發是瓶頸;非研發角色感受不到研發技改提效的變化;引入ISV 團隊又擔心質量和安全,培訓周期長“等等,基于此我們探索了一種新的技術體系及交付方案來解決如上問題。 ......

    uj5u.com 2023-04-20 08:18:49 more
  • 05單件模式

    #經典的單件模式 public class Singleton { private static Singleton uniqueInstance; //一個靜態變數持有Singleton類的唯一實體。 // 其他有用的實體變數寫在這里 //構造器宣告為私有,只有Singleton可以實體化這個類! ......

    uj5u.com 2023-04-19 08:42:51 more
  • 【架構與設計】常見微服務分層架構的區別和落地實踐

    軟體工程的方方面面都遵循一個最基本的道理:沒有銀彈,架構分層模型更是如此,每一種都有各自優缺點,所以請根據不同的業務場景,并遵循簡單、可演進這兩個重要的架構原則選擇合適的架構分層模型即可。 ......

    uj5u.com 2023-04-19 08:42:41 more