How to monitor elasticsearch using nagios

Asked 23/4, 2012 at 8:10 Answered 5/3, 2020 at 11:3

I would like to monitor elasticsearch using nagios. Basiclly, I want to know if elasticsearch is up.

I think I can use the elasticsearch Cluster Health API (see here)

and use the 'status' that I get back (green, yellow or red), but I still don't know how to use nagios for that matter ( nagios is on one server and elasticsearc is on another server ).

Is there another way to do that?

EDIT : I just found that - check_http_json. I think I'll try it.

Comma answered 23/4, 2012 at 8:10 Comment(0)

After a while - I've managed to monitor elasticsearch using the nrpe. I wanted to use the elasticsearch Cluster Health API - but I couldn't use it from another machine - due to security issues... So, in the monitoring server I created a new service - which the check_command is check_command check_nrpe!check_elastic. And now in the remote server, where the elasticsearch is, I've editted the nrpe.cfg file with the following:

command[check_elastic]=/usr/local/nagios/libexec/check_http -H localhost -u /_cluster/health -p 9200 -w 2 -c 3 -s green

Which is allowed, since this command is run from the remote server - so no security issues here...

It works!!! I'll still try this check_http_json command that I posted in my qeustion - but for now, my solution is good enough.

Comma answered 9/5, 2012 at 12:28 Comment(1)

Thanks for figuring this out! In addition to working across systems to get around security issues, it is great for monitoring clusters on machine with differing directory structures. The check_http plugin is in 3 different directories on our various servers. This method lets me run the check, but let the local machine manage the plugin path. Thanks again! – Huei 14/9, 2012 at 16:39

After playing around with the suggestions in this post, I wrote a simple check_elasticsearch script. It returns the status as OK, WARNING, and CRITICAL corresponding to the "status" parameter in the cluster health response ("green", "yellow", and "red" respectively).

It also grabs all the other parameters from the health page and dumps them out in the standard Nagios format.

Enjoy!

Verenaverene answered 21/9, 2012 at 23:46 Comment(0)

Shameless plug: https://github.com/jersten/check-es

You can use it with ZenOSS/Nagios to monitor cluster health, data indices, and individual node heap usage.

Hedgepeth answered 3/11, 2014 at 23:9 Comment(1)

Can I check unassigned_shards with this? – Demivolt 12/6, 2017 at 20:8

You can use this cool Python script for monitoring your Elasticsearch cluster. This script check your IP:port for Elasticsearch status. This one and more Python script for monitoring Elasticsearch can be found here.

#!/usr/bin/python
from nagioscheck import NagiosCheck, UsageError
from nagioscheck import PerformanceMetric, Status
import urllib2
import optparse

try:
    import json
except ImportError:
    import simplejson as json


class ESClusterHealthCheck(NagiosCheck):

    def __init__(self):

        NagiosCheck.__init__(self)

        self.add_option('H', 'host', 'host', 'The cluster to check')
        self.add_option('P', 'port', 'port', 'The ES port - defaults to 9200')

    def check(self, opts, args):
        host = opts.host
        port = int(opts.port or '9200')

        try:
            response = urllib2.urlopen(r'http://%s:%d/_cluster/health'
                                       % (host, port))
        except urllib2.HTTPError, e:
            raise Status('unknown', ("API failure", None,
                         "API failure:\n\n%s" % str(e)))
        except urllib2.URLError, e:
            raise Status('critical', (e.reason))

        response_body = response.read()

        try:
            es_cluster_health = json.loads(response_body)
        except ValueError:
            raise Status('unknown', ("API returned nonsense",))

        cluster_status = es_cluster_health['status'].lower()

        if cluster_status == 'red':
            raise Status("CRITICAL", "Cluster status is currently reporting as "
                         "Red")
        elif cluster_status == 'yellow':
            raise Status("WARNING", "Cluster status is currently reporting as "
                         "Yellow")
        else:
            raise Status("OK",
                         "Cluster status is currently reporting as Green")

if __name__ == "__main__":
    ESClusterHealthCheck().run()

Zanezaneski answered 14/9, 2016 at 9:17 Comment(1)

line 23 should be changed to, host = opts.host or 'localhost' – Niko 14/10, 2019 at 13:39

I wrote this a million years ago, and it might still be useful: https://github.com/radu-gheorghe/check-es

But it really depends on what you want to monitor. The above measures:

if Elasticsearch responds to HTTP
if ingestion rate drops under the defined levels
if total number of documents drops the defined levels

But of course there's much more that might be interesting. From query time to JVM heap usage. We wrote a blog post about the most important ones here: https://sematext.com/blog/top-10-elasticsearch-metrics-to-watch/

Elasticsearch has APIs for all these, so you may be able to use a generic check_http_json to get the needed metrics. Alternatively, you may want to use something like Sematext Monitoring for Elasticsearch, which gets these metrics out of the box, then forward threshold/anomaly alerts to Nagios. (disclosure: I work for Sematext)

Kriss answered 5/3, 2020 at 11:3 Comment(0)

Recommended topics

Hot tags