在我的水槽流中,我想要一個自定義的動態 hdfs 路徑,但沒有資料被填充到攔截器。
示例資料: 188 17 2016-06-01 00:31:10 6200.041736 0
配置
agent2.sources.source2.interceptors = i2 i3 i4
agent2.sources.source2.interceptors.i2.type = regex_extractor
agent2.sources.source2.interceptors.i3.type = regex_extractor
agent2.sources.source2.interceptors.i4.type = regex_extractor
# regex to pick up the year
agent2.sources.source2.interceptors.i2.regex = (?<=\t)[0-9]{4}(?=-)
agent2.sources.source2.interceptors.i2.serializers = y
agent2.sources.source2.interceptors.i2.serializers.y.name = year
# regex to pick up the month
agent2.sources.source2.interceptors.i3.regex = (?<=-)[0-9]{2}(?=-)
agent2.sources.source2.interceptors.i3.serializers = m
agent2.sources.source2.interceptors.i3.serializers.m.name = month
# regex to pick up the day
agent2.sources.source2.interceptors.i4.regex = (?<=-)[0-9]{2}(?=\t)
agent2.sources.source2.interceptors.i4.serializers = d
agent2.sources.source2.interceptors.i4.serializers.d.name = day
# Define the HDFS sink 2 –year and month
agent2.sinks.sink-hdfs2.type = hdfs
agent2.sinks.sink-hdfs2.hdfs.path = /group-project/consumption/%{year}/%{month}
agent2.sinks.sink-hdfs2.hdfs.filePrefix = %{year}-%{month}
agent2.sinks.sink-hdfs2.hdfs.fileSuffix = .txt
uj5u.com熱心網友回復:
年和日的前瞻和后視將僅匹配制表符。它們不會匹配多個空格。你最好使用\\s.
此外,Flume 需要兩個反斜杠來表示正則運算式符號,\\t而不是\t.
或者,您可以使用一個正則運算式來獲取整個日期,并使用多個捕獲組將它們分配給不同的序列化程式。例如,(\\d{4})-(\\d{2})-(\\d{2})
該水槽用戶指南有一個很好的例子:
如果包含 Flume 事件體
1:2:3.4foobar5并且使用了以下配置
a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)
a1.sources.r1.interceptors.i1.serializers = s1 s2 s3
a1.sources.r1.interceptors.i1.serializers.s1.name = one
a1.sources.r1.interceptors.i1.serializers.s2.name = two
a1.sources.r1.interceptors.i1.serializers.s3.name = three
提取的事件將包含相同的正文,但將添加以下標題
one=>1, two=>2, three=>3
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/376249.html
