Collecting Tomcat logs using Fluentd and Elasticsearch

05 / Oct / 2016 by Arun Dhyani 0 comments

In our previous blog, we have covered the basics of fluentd, the lifecycle of fluentd events and the primary directives involved. In this blog, we’ll configure fluentd to dump tomcat logs to Elasticsearch. We’ll also talk about filter directive/plugin and how to configure it to add hostname field in the event stream.

fluentd

Fluentd is a log collector that works on Unified Logging Layer. It collects logs from various sources and uploads them to datastores. Fluentd reads the log file and forwards data as an event stream to either some datastore or fluentd aggregator that in turn send logs to datastore. In our use-case, we’ll forward logs directly to our datastore i.e. Elasticsearch. Elasticsearch is a search server that stores data in schema-free JSON documents.

So, let’s get started.

First, we need to install td-agent on the application server. Td-agent is the stable distribution of fluentd provided by Treasure data. To install td-agent on Ubuntu run the following command:


curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent2.sh | sh

Next, we need to install Elasticsearch plugin for td-agent that provides td-agent the ability to forward events to Elasticsearch.

Run the below given command to install Elasticsearch plugin in td-agent


/usr/sbin/td-agent-gem install fluent-plugin-elasticsearch

Now that everything is installed, it’s time to jump into the td-agent configuration to forward logs to Elasticsearch.

Configuration file of td-agent is located at /etc/td-agent/td-agent.conf. In our configuration, we’ll define three blocks/directives/plugins:

  • source: It defines the input source of events.
<source>
type tail
format multiline
format_firstline /[0-9]{2}-[A-Za-z]{3}-[0-9]{4}/
format1 /^(?<datetime>[0-9]{2}-[A-Za-z]{3}-[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}) (?<Log-Level>[A-Z]*) (?<message>.*)$/
path /opt/apache-tomcat-8.0.33/logs/catalina.out
tag tomcat.logs
</source>

Parameters defined inside source directive:

type: This defines the type of input plugin used. Tail plugin reads events by tailing the text file.
format: It explains how to read events from the text file. In our configuration, it says multi-line which means single event can contain multiple lines. Multi-line is useful in collecting stacktrace as a single event in tomcat log.
format_firstline: It defines the first line of an event using the regular expression. In our configuration, the regular expression corresponds to the date which means a log event starts with the date.
format1: It defines the fields and full format of the single log event using the regular expression. In our configuration, we have created three fields: datetime, Log-Level, and message.
path: This defines the path of the log file
tag: This is used to provide a unique name to log events. Tags are used in routing log events to datastore.

  • filter: filter directive enables us to modify event stream.
<filter tomcat.logs>
type record_transformer
<record>
hostname ${hostname}
</record>
</filter>

Parameters defined in filter directive:

type: This specifies the type of filter plugin. type record_tranformer is used to manipulate the incoming event stream.
record: The record block refers to the whole record event.
hostname: It defines the name of the field to be added to the record.

  • match: match directives finds the event stream with matching tags and processes them. In our configuration, we have used match directive to forward logs to the Elasticsearch.
<match tomcat.logs>
type elasticsearch
host demo.elasticsearch.tothenew.com
port 9200
logstash_format true
logstash_prefix tomcat.logs
flush_interval 1s
</match>

Parameters defined in match directives:

type: This represents the type of output plugin. In our case, it’s Elasticsearch.
host: It specifies the IP or domain name of the Elasticsearch.
port: It specifies port of the Elasticsearch server.
logstash_format: Use logstash format for storing events on Elasticsearch.
logstash_prefix: It defines prefix of the index name on which events will be stored.
flush_interval:  This sets the time after which the data will be flushed to the Elasticsearch.

Now, start td-agent using the following command:


service td-agent start

So, this is how you configure centralized logging using Fluentd and Elasticsearch.

Hope this blog was useful.

FOUND THIS USEFUL? SHARE IT

Leave a comment -