目錄
- 一.Failover Sink Processor測驗
- 二.雙層的Flume架構
- 三.單source多channel多sink
一.Failover Sink Processor測驗
官網解釋Failover Sink Processor:
Failover Sink Processor維護一個按優先級排列的sink串列,確保只要有一個sink可用,事件就會被處理(交付),
Failover機制的作業原理是將失敗的接收轉移到池中,在池中為它們分配一個冷卻期,在重新嘗試它們之前,隨著順序故障的增加而增加,一旦接收器成功地發送了一個事件,它就會被恢復到活動池,sink有一個與它們相關聯的優先級,數量越大,優先級越高,如果一個接收器在發送事件時失敗,下一個具有最高優先級的接收器將被嘗試下一步發送事件,例如,優先級為100的接收器在優先級為80的接收器之前被激活,如果沒有指定優先級,則thr優先級根據配置中指定的sink的順序確定,
要進行配置,將sink組處理器設定為Failover,并為所有單個的sink設定優先級,所有指定的優先級必須是唯一的,此外,可以使用maxpenalty屬性設定Failover時間的上限(以毫秒為單位),
下圖中44446的優先級更高:

左邊agent的配置failover.conf:
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 44444
a1.channels.c1.type = memory
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop000
a1.sinks.k1.port = 44445
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop000
a1.sinks.k2.port = 44446
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
k2即agent1的44446埠的優先級高(數字越大優先級越高),
發送資料:
[hadoop@hadoop000 apache-flume-1.6.0-cdh5.15.1-bin]$ telnet localhost 44444
Trying 192.168.198.128...
Connected to localhost.
Escape character is '^]'.
aaa
OK
bbb
OK
ccc
OK
ddd
OK
eee
OK
fff
OK
44446接收到資訊:
21/01/25 18:14:47 INFO ipc.NettyServer: [id: 0x0ce2a19e, /192.168.198.128:45240 => /192.168.198.128:44446] OPEN
21/01/25 18:14:47 INFO ipc.NettyServer: [id: 0x0ce2a19e, /192.168.198.128:45240 => /192.168.198.128:44446] BOUND: /192.168.198.128:44446
21/01/25 18:14:47 INFO ipc.NettyServer: [id: 0x0ce2a19e, /192.168.198.128:45240 => /192.168.198.128:44446] CONNECTED: /192.168.198.128:45240
21/01/25 18:15:40 INFO sink.LoggerSink: Event: { headers:{} body: 61 61 61 0D aaa. }
21/01/25 18:16:11 INFO sink.LoggerSink: Event: { headers:{} body: 62 62 62 0D bbb. }
將agent3 kill掉,44445埠被激活:
21/01/25 18:14:46 INFO ipc.NettyServer: [id: 0x946f8c34, /192.168.198.128:55142 => /192.168.198.128:44445] OPEN
21/01/25 18:14:46 INFO ipc.NettyServer: [id: 0x946f8c34, /192.168.198.128:55142 => /192.168.198.128:44445] BOUND: /192.168.198.128:44445
21/01/25 18:14:46 INFO ipc.NettyServer: [id: 0x946f8c34, /192.168.198.128:55142 => /192.168.198.128:44445] CONNECTED: /192.168.198.128:55142
21/01/25 18:16:42 INFO sink.LoggerSink: Event: { headers:{} body: 63 63 63 0D ccc. }
21/01/25 18:16:48 INFO sink.LoggerSink: Event: { headers:{} body: 64 64 64 0D ddd. }
21/01/25 18:47:19 INFO sink.LoggerSink: Event: { headers:{} body: 65 65 65 0D eee. }
重啟agent3,44446埠再次被激活:
21/01/25 18:50:10 INFO ipc.NettyServer: [id: 0x58750737, /192.168.198.128:45596 => /192.168.198.128:44446] OPEN
21/01/25 18:50:10 INFO ipc.NettyServer: [id: 0x58750737, /192.168.198.128:45596 => /192.168.198.128:44446] BOUND: /192.168.198.128:44446
21/01/25 18:50:10 INFO ipc.NettyServer: [id: 0x58750737, /192.168.198.128:45596 => /192.168.198.128:44446] CONNECTED: /192.168.198.128:45596
21/01/25 18:50:13 INFO sink.LoggerSink: Event: { headers:{} body: 66 66 66 0D fff. }
二.雙層的Flume架構
這篇博客寫的特別詳細:Flume日志收集分層架構應用實踐.
雙層Flume的好處:
- 解耦,hdfs或者kafka需要升級時,第二層flume可以進行緩沖,不會影響第一層,
- 安全,hdfs或者kafka直接暴露給第一層不安全(第一層很多flume來自其他部門,第二層在本地),
- 利于業務的分組管理,將第一組的繁雜業務在第二層可以進行分組,
- 小檔案的數量會大大減少,
- 外部某個型別的業務日志資料節點需要擴容,直接在L1層將資料流指向資料平臺內部與之相對應的L2層Flume Agent節點組即可,
三.單source多channel多sink
第一層source發送一個訊息,channel1和channel2都會傳輸,agent2和agent3都會收到相同的資料,所以這種架構可以將同一份資料,即可以匯入hdfs進行離線計算,也可同時匯入實時框架進行實時計算,實作多用途,
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/253065.html
標籤:其他
