多個agent串聯 · 大數據

[TOC] # 分析采集需求：比如業務系統使用log4j生成的日志，日志內容不斷增加，需要把追加到日志文件中的數據實時采集到hdfs,使用agent串聯 ![](https://box.kancloud.cn/e18d4e601a0465768b2a1cb254c179b7_1035x303.png) 根據需求，首先定義以下3大要素第一臺flume agent * 采集源，即source——監控文件內容更新 : exec ‘tail -F file’ * 下沉目標，即sink——數據的發送者，實現序列化 : avro sink * Source和sink之間的傳遞通道——channel，可用file channel 也可以用內存channel 第二臺flume agent * 采集源，即source——接受數據。并實現反序列化 : avro source * 下沉目標，即sink——HDFS文件系統 : HDFS sink * Source和sink之間的傳遞通道——channel，可用file channel 也可以用內存channel # 配置文件第一臺配置 Flume-agent1 ~~~ #tail-avro-avro-logger.conf # Name the components on this agent # 定義名稱 a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec # 監聽這個文件 a1.sources.r1.command = tail -F /root/logs/test.log a1.sources.r1.channels = c1 # Describe the sink ##sink端的avro是一個數據發送者 a1.sinks.k1.type = avro # 推給這個機器,自己定義 a1.sinks.k1.hostname = master # 端口 a1.sinks.k1.port = 41414 # 批量大小 a1.sinks.k1.batch-size = 10 # Use a channel which buffers events in memory # 內存channels a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel # 組裝起來 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 ~~~ Flume-agent2: avro-hdfs.conf ~~~ a1.sources = r1 a1.sinks =s1 a1.channels = c1 ##source中的avro組件是一個接收者服務 a1.sources.r1.type = avro # 綁定一個ip和端口 a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 41414 a1.sinks.s1.type=hdfs # hdfs目錄 a1.sinks.s1.hdfs.path=hdfs://master:9000/flumedata a1.sinks.s1.hdfs.filePrefix = access_log a1.sinks.s1.hdfs.batchSize= 100 a1.sinks.s1.hdfs.fileType = DataStream a1.sinks.s1.hdfs.writeFormat =Text a1.sinks.s1.hdfs.rollSize = 10240 a1.sinks.s1.hdfs.rollCount = 1000 a1.sinks.s1.hdfs.rollInterval = 10 a1.sinks.s1.hdfs.round = true a1.sinks.s1.hdfs.roundValue = 10 a1.sinks.s1.hdfs.roundUnit = minute a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.channels = c1 a1.sinks.s1.channel = c1 ~~~