In the Flume configuration file, we need to −
- Name the components of the current agent.
- Describe/Configure the source.
- Describe/Configure the sink.
- Describe/Configure the channel.
- Bind the source and the sink to the channel.
Naming the Components
First of all, you need to name/list the components such as sources, sinks, and the channels of the agent, as shown below.
Flume supports various sources, sinks, and channels. They are listed in the table given below.
agent_name.sources = source_name agent_name.sinks = sink_name agent_name.channels = channel_name
After listing the components of the agent, you have to describe the source(s>
TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS
, and channel(s>
by providing values to their properties.
Describing the Source
Each source will have a separate list of properties. The property named “type” is common to every source, and it is used to specify the type of the source we are using.
Along with the property “type”, it is needed to provide the values of all the required properties of a particular source to configure it, as shown below.
For example, if we consider the twitter source, following are the properties to which we must provide values to configure it.
agent_name.sources. source_name.type = value agent_name.sources. source_name.property2 = value agent_name.sources. source_name.property3 = value
TwitterAgent.sources.Twitter.type = Twitter (type name>
TwitterAgent.sources.Twitter.consumerKey = TwitterAgent.sources.Twitter.consumerSecret = TwitterAgent.sources.Twitter.accessToken = TwitterAgent.sources.Twitter.accessTokenSecret =
Describing the Sink
Just like the source, each sink will have a separate list of properties. The property named “type” is common to every sink, and it is used to specify the type of the sink we are using. Along with the property “type”, it is needed to provide values to all the required properties of a particular sink to configure it, as shown below.
For example, if we consider HDFS sink, following are the properties to which we must provide values to configure it.
agent_name.sinks. sink_name.type = value agent_name.sinks. sink_name.property2 = value agent_name.sinks. sink_name.property3 = value
TwitterAgent.sinks.HDFS.type = hdfs (type name>
TwitterAgent.sinks.HDFS.hdfs.path = HDFS directory’s Path to store the data
Describing the Channel
Flume provides various channels to transfer data between sources and sinks. Therefore, along with the sources and the channels, it is needed to describe the channel used in the agent.
To describe each channel, you need to set the required properties, as shown below.
For example, if we consider memory channel, following are the properties to which we must provide values to configure it.
agent_name.channels.channel_name.type = value agent_name.channels.channel_name. property2 = value agent_name.channels.channel_name. property3 = value
TwitterAgent.channels.MemChannel.type = memory (type name>
Binding the Source and the Sink to the Channel
Since the channels connect the sources and sinks, it is required to bind both of them to the channel, as shown below.
The following example shows how to bind the sources and the sinks to a channel. Here, we consider twitter source, memory channel, and HDFS sink.
agent_name.sources.source_name.channels = channel_name agent_name.sinks.sink_name.channels = channel_name
TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sinks.HDFS.channels = MemChannel
Starting a Flume Agent
After configuration, we have to start the Flume agent. It is done as follows −
$ bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf Dflume.root.logger=DEBUG,console -n TwitterAgent
- agent − Command to start the Flume agent
- --conf ,-c<conf> − Use configuration file in the conf directory
- -f<file> − Specifies a config file path, if missing
- --name, -n <name> − Name of the twitter agent
- -D property =value − Sets a Java system property value.