Tuesday, 12 May 2015

Elasticsearch error - Exception in thread ">output" org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]

After adding the new elasticsearch node, initially, I was struggling with error :

WARN: org.elasticsearch.discovery: [logstash-id.xxx.in-25379-6424] waited for 30s and no initial state was set by the discovery

and I corrected the situation by adding iptables rules.

Thereafter, things appeared to be smooth but could not find new elasticsearch index using Elasticsearch head plugin even after few minutes. So, I started searching through debug logs and spotted an error :

log4j, [2015-05-12T11:00:02.003] DEBUG: org.elasticsearch.discovery.zen: [logstash-id.xxx.in-26328-4264] filtered ping responses: (filter_client[true], filter_data[false]) {none}
Exception in thread ">output" org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
    at org.elasticsearch.action.support.master.TransportMasterNodeOperationAction$3.onTimeout(org/elasticsearch/action/support/master/TransportMasterNodeOperationAction.java:180)
    at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(org/elasticsearch/cluster/service/InternalClusterService.java:492)
log4j, [2015-05-12T11:00:06.507] DEBUG: org.elasticsearch.discovery.zen: [logstash-id.xxx.in-26328-4264] filtered ping responses: (filter_client[true], filter_data[false]) {none}
^CInterrupt received. Shutting down the pipeline. {:level=>:warn, :file=>"logstash/agent.rb", :line=>"119"}

Since I was using latest version of logstash and elasticsearch, I was bit puzzled as all the google solutions(references) were pointing to the old versions of them:

[root@id admin]# /opt/logstash/bin/logstash --version
logstash 1.4.2-modified

[root@es2 ~]# rpm -qa|grep elastic -i
elasticsearch-1.4.4-1.noarch

Finally, after reading good documentation of logstash, I decided to add 'protocol' option and vola! - it worked!!

output {
        #stdout { codec => rubydebug }
        if [type] == "netflow" {
                elasticsearch {
                        cluster => "elk-cluster"
                        index => "netflow-%{+YYYY.MM.dd}"
                        host => "10.4.0.47"
                        protocol => "http"
                        workers => 2
                }
        }


Elasticsearch warning - WARN: org.elasticsearch.discovery: [...] waited for 30s and no initial state was set by the discovery

I added a new elasticsearch node and got puzzled by the error:

log4j, [2015-05-12T10:42:18.996]  WARN: org.elasticsearch.discovery: [logstash-id.xxx.in-25379-6424] waited for 30s and no initial state was set by the discovery
^C

Finally, it turned out  that iptables was blocking access to elasticsearch host. So, after adding the firewall rules, things worked flawlessly.

Allow elasticsearch in iptables(firewall) rules:

#Allow elasticsearch access
iptables -A INPUT -s 10.4.0.45 -i eth0 -p tcp --dport 9200 -j ACCEPT
iptables -A OUTPUT -d 10.4.0.45 -o eth0 -p tcp --dport 9200 -j ACCEPT

Elasticsearch curator - delete commands

If you wish to delete some old elasticsearch indices, curator is the de-facto standard. The program provides so many useful options and I appreciate the versatility. Although, curator has a wiki and elasticsearch page, things were not that clear to me in one go. So, here are my notes on deleting elasticsearch indices:

[admin@mgr ~]$ curator --version
curator, version 3.0.0

[admin@mgr ~]$ /usr/bin/curator --host 10.4.0.46 delete --help
Usage: curator delete [OPTIONS] COMMAND [ARGS]...

  Delete indices or snapshots

Options:
  --disk-space FLOAT  Delete indices beyond DISK_SPACE gigabytes.
  --reverse BOOLEAN   Only valid with --disk-space. Affects sort order of the
                      indices.  True means reverse-alphabetical (if dates are
                      involved, older is deleted first).  [default: True]
  --help              Show this message and exit.

Commands:
  indices    Index selection.
  snapshots  Snapshot selection.


[admin@mgr ~]$ /usr/bin/curator --host 10.4.0.46 delete indices --help
Usage: curator delete indices [OPTIONS]

  Get a list of indices to act on from the provided arguments, then perform
  the command [alias, allocation, bloom, close, delete, etc.] on the
  resulting list.

Options:
  --newer-than INTEGER            Include only indices newer than n time_units
  --older-than INTEGER            Include only indices older than n time_units
  --prefix TEXT                   Include only indices beginning with prefix.
  --suffix TEXT                   Include only indices ending with suffix.
  --time-unit [hours|days|weeks|months]
                                  Unit of time to reckon by
  --timestring TEXT               Python strftime string to match your index
                                  definition, e.g. 2014.07.15 would be
                                  %Y.%m.%d
  --regex TEXT                    Provide your own regex, e.g
                                  '^prefix-.*-suffix$'
  --exclude TEXT                  Exclude matching indices. Can be invoked
                                  multiple times.
  --index TEXT                    Include the provided index in the list. Can
                                  be invoked multiple times.
  --all-indices                   Do not filter indices.  Act on all indices.
  --help                          Show this message and exit.

If you wish to use dry-run feature in curator:

[admin@mgr ~]$ /usr/bin/curator --host 10.4.0.46 --dry-run --debug delete indices --time-unit days --timestring "\%Y.\%m.\%d" --older-than 7 --prefix logstash- --all-indices

You should always specify logfile and loglevel (CRITICAL = 50, ERROR=40, WARNING=30, INFO=20, DEBUG=10, NOTSET=0) to capture curator program output for ease of debugging and specify sufficient timeout period. The timeout period becomes important when you are taking snapshots of indices. So, always  give sufficient timeout period:

[admin@mgr ~]$ /usr/bin/curator --host 10.4.0.47 --timeout 300 --logfile curator_log.txt --loglevel 10 --debug delete indices --time-unit days --timestring "\%Y.\%m.\%d" --older-than 7 --prefix logstash- --all-indices


[admin@mgr ~]$ /usr/bin/curator --host 10.4.0.46 --timeout 300 --logfile curator_log.txt --loglevel 10 --debug delete indices --time-unit days --timestring "\%Y.\%m.\%d" --older-than 7 --prefix logstash-apache --all-indices

If you wish to delete the indices on the basis of hard disk space,use the following command:

[admin@mgr ~]$ /usr/bin/curator --host 10.4.0.46 --timeout 300 --logfile ttt.txt --loglevel 10 --debug delete --disk-space 40 indices --all-indices