Thursday, 26 November 2015

Setting up collectd based monitoring system(server with multiple agents) and sending system metrics to graphite

There are many tutorials on setting up collectd as an agent on a machine. However, I have not found  many tutorials that describe how to setup a centralized "collectd" server with multiple collectd agents. This setup will  aggregate system metrics from multiple clients and sends them to Graphite
This is my attempt to note down the steps that I used for setting up centralized collectd metric collection system.

Download latest version of collectd.tar.gz (collectd-5.5 or more). Collectd, by default,contains  many plugins e.g. cpu,load,disk,graphite(write_graphite),redis, mysql etc. As a result, it is  possible to capture almost all the system metrics.

Basic installation of collectd involves usual steps:- ./configure and make install for collectd server  as well as collectd agent:

# tar -xvf collectd-version.tar.gz
# cd collectd-version
#./configure --disable-turbostat # if SL <6.6 or CentOS < 6.6
# make
# make install

Copy collectd daemon file to /etc/init.d
# cp ./contrib/redhat/init.d-collectd /etc/init.d/collectd
# chmod +x /etc/init.d/collectd

Make soft links for binaries in /opt/collectd/bin
# ln-s /opt/collectd/sbin/collectdmon /usr/bin/collectdmon
# ln -s /opt/collectd/sbin/collectd /usr/bin/collectd

If you are sending collectd metrics to graphite, please make sure that graphite is installed  and is running prior to compiling collectd. It is presumed that graphite and collectd server  are on the same machine in this use-case.

Setting up collectd server

Once collectd is installed as described above, modify the /opt/collectd/etc/collectd.conf file to contain the following:

Hostname "hostname"
FQDNLookup true
BaseDir "/opt/collectd/var/lib/collectd"
PIDFile "/opt/collectd/var/run/collectd.pid"
PluginDir "/opt/collectd/lib/collectd"
TypesDB "/opt/collectd/share/collectd/types.db"
Interval 10

LoadPlugin logfile
<Plugin logfile>
  LogLevel info
  File "/var/log/collectd.log"
  Timestamp true
  PrintSeverity true
</Plugin>

LoadPlugin network
<Plugin network>
Listen "*" "25826"
</Plugin>

LoadPlugin interface
<Plugin interface>
Interface "eth0"
</Plugin>

LoadPlugin write_graphite
<Plugin write_graphite>
<Node "graphing">
Host "localhost"
Port "2003"
Protocol "tcp"
LogSendErrors true
Prefix "collectd."
StoreRates true
AlwaysAppendDS false
EscapeCharacter "_"
</Node>
</Plugin>

Make adjustments for your network as needed.

Run collectd using:
# service collectd start

Some useful information that is required while setting up:

Default directories for collectd:

Plugins -/opt/collectd/lib/collectd
Binaries - /opt/collectd/sbin/collectd 
Configuration file: /opt/collectd/etc/collectd.conf 

Run collectd like this:
/opt/collectd/sbin/collectd -C /opt/collectd/etc/collectd.conf

Test Collectd configuration:
#collectd -t

Test Collectd plugin configuration:
#collectd -T

Check netstat output:
# nestat -naptul |grep "25826"

Setting up Collectd agent

Now, we are going install the collectd agent on the client machine and then tell it to send the  metrics to the collectd server (not Graphite). The collectd clients do not need "write_graphite"  plugin and can use the older version of Collectd rpms that are available in CentOS/SL  repositories. So, on each client, run:

# yum install collectd collectd-utils

Modify /etc/collectd.conf config file as per your requirement:

Hostname "hostname"
FQDNLookup true
BaseDir "/var/lib/collectd"
PIDFile "/var/run/collectd.pid"
PluginDir "/usr/lib/collectd"
TypesDB "/usr/share/collectd/types.db"
Interval 10
#Timeout 5
ReadThreads 5

LoadPlugin logfile
<Plugin logfile>
  LogLevel info
  File "/var/log/collectd.log"
  Timestamp true
  PrintSeverity true
</Plugin>

LoadPlugin network
<Plugin network>
Server "collectd-server.domain.com" "25826"
</Plugin>

LoadPlugin cpu
LoadPlugin load
LoadPlugin disk
LoadPlugin memory
LoadPlugin processes

Include "/etc/collectd/filters.conf"
Include "/etc/collectd/thresholds.conf"

Be sure to configure the network plugin with your collectd server information.

With this configuration, client metrics statistics are sent to collectd server on port 25826.  These are further sent to Graphite.  If you want to spice up web front-end, you can use grafana and show the trend of system metrics.

Enabling python plugin

If you wish to enable python and iptables plugin support, please do the following:

# yum install python-devel
# yum install iptables-devel

Now, re-compile the collectd source package once again for these modules.
# cd collectd-version
# ./configure --enable-python --enable-iptables
# make
# make install

This process is same for any additional plugins that you may wish to add - e.g. mysql, postgres etc.

Please check the Modules output carefully for the plugin support while configuring source package during compilation.


 Some of the useful links that I encountered while setup:
  • https://collectd.org/wiki/index.php
  • https://collectd.org/wiki/index.php/Match:Hashed/Config
  • http://blog.matthewdfuller.com/2014/06/sending-collectd-metrics-to-graphite.html
  • https://keptenkurk.wordpress.com/2015/08/28/using-collectd-with-multiple-networked-instances/
  • http://giovannitorres.me/enabling-almost-all-collectd-plugins-on-centos-6.html