Flume基礎概念及其環境配置
1、基本組件
1)source:收集(負責從xxx地方采集資料)
2)channel:聚集(資料存入記憶體/檔案/寫入到kafka中)
3)sink:輸出(負責讀取channel資料,寫入到目的地(一個或者多個)去(hdfs/hive中))
2、環境配置(前提是安裝了jdk8)

# 上傳檔案并且進行解壓到某檔案夾
tar zxvf /root/software/tran_zip/apache-flume-1.6.0-bin.tar.gz -C software
# 配置環境變數
vi /etc/profile
# 進行加載
source /etc/profile

# 修改組態檔flume-env.sh
cd /root/software/flume-1.6.0/conf
cp flume-env.sh.template flume-env.sh
vi flume-env.sh

# 驗證是否安裝成功
# 在bin目錄下
flume-ng version

3、實戰:從指定網路埠采集資料到控制臺
使用Flume的關鍵是寫組態檔
1)配置Source
2)配置Channel
3)配置Sink
4)將上述三個組件進行連接起來
# example.conf:單節點 Flume 配置
a1:agent名稱
r1:sources的名稱
c1:channels的名稱
k1:sinks的名稱
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost/master
a1.sources.r1.port = 44444
# Describe the sink 將日志輸出到控制臺
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
# 一個source輸出到多個channel,一個channel可以輸出到一個sink

啟動agent
flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/example.conf --name a1 -Dflume.root.logger=INFO,console
進行測驗
# 需要使yum進行安裝telnet => 命令:yum -y intsall telnet
telnet master 44444
4、實戰二:監控一個檔案 實時采集 新增的資料 輸出到控制臺

#Agent 選型: exec source
#name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /root/data/flume_data.log
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
# 一個source輸出到多個channel,一個channel可以輸出到一個sink

執行代碼:
# 步驟:
# 1、創建了exec-memory-logger.conf檔案
# 2、改寫創建的檔案
# 3、進行執行代碼
flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec-memory-logger.conf --name a1 -Dflume.root.logger=INFO,console
結果展示:

4、實戰三:將A服務器上(website)的日志實時采集到B服務器上(hdfs)
A服務器上:exec source + memory channel + 跨結點采用的avro Sink
B服務器上:avro source + memory channel + 輸出到控制臺 logger-sink
# A服務器上
aver-memory-avro.conf
# Name the components on this agent
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel
# Describe/configure the source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /root/data/log_data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c
# Describe the sink 將日志輸出到控制臺
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname=master
exec-memory-avro.sinks.avro-sink.port=44444
# Use a channel which buffers events in memory
exec-memory-avro.channels.memory-channel.type = memory
# Bind the source and sink to the channel
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel
# B服務器上
aver-memory-logger.conf
#name the components on this agent
avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel
# Describe/configure the source
avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = master
avro-memory-logger.sources.avro-source.port = 44444
# Describe the sink 將日志輸出到控制臺
avro-memory-logger.sinks.logger-sink.type = logger
# Use a channel which buffers events in memory
avro-memory-logger.channels.memory-channel.type = memory
# Bind the source and sink to the channel
avro-memory-logger.source.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel
啟動
# 1、先啟動logger
flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/aver-memory-logger.conf --name avro-memory-logger -Dflume.root.logger=INFO,console
# 2、啟動aver-memory-avro
flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/aver-memory-avro.conf --name exec-memory-avro -Dflume.root.logger=INFO,console

實戰三的基本流程:

轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/386511.html
標籤:其他
上一篇:拉鏈表的Hive環境下的詳細操作
下一篇:如何保證訊息佇列高可用
