How to combine process and file check in monit?
Asked Answered
U

1

6

Summary

How can I combine multiple checks in Monit? I want to check against process activity and file content/timestamp.


Long and boring explanation

I'm working on a Monit daemon for keeping my Bukkit Minecraft server up. It does several checks. At the moment I have this code:

#!monit

check process bukkit pidfile /var/run/bukkit.pid # check if the java process is running
    start program = "/sbin/start bukkit"         # start with Upstart
    stop program  = "/sbin/stop bukkit"          # stop with Upstart

    if failed                                    # send a noop request to check if the server responses
        host cubixcraft.de port 20059 protocol http
        and request "/api/call?method=runConsoleCommand&args=%5B%22noop%22%5D&key=d9c7f3f6be0c92c1b2725f0e5a3352514cee0885c3bf7e0189a76bbaf2f4d7a7"
            with checksum e006695c8da58e03f17a305afd1a1a32
            timeout 20 seconds for 2 cycles
    then restart                                 # restart if it fails

It works... but it's slow. I have to wait 20 seconds until the server gets terminated if something went wrong. But I need that timeout because the server does some reloads (to refresh the configuration, clean the memory, etc.) from time to time which produce little lags. Without the timeout 20 seconds for 2 cycles the server would be terminated immedeately if it reloads.

Okay, it's no problem for me to wait 20 seconds until the server gets restarted if something really went wrong. But most of the time (when something goes wrong) all security mechanisms on the server quit working.

And because of that I need to find a way to restart the server immedeatly if it doesn't response, but give it some time, when it reloads.

I have this approach: The server writes something to a logfile, when any command (including reloads and API calls which I use to check the server status) is issued. So the timestamp of the logfile is the timestamp of the last command. During a reload nothing gets written to the file. So I can detect a reload with a simple timestamp check and only if the server currently reloads I give it its 20 seconds.

Ursala answered 6/2, 2012 at 18:20 Comment(0)
D
0

i managed to do this by overriding start program:

start program = "/bin/bash -c '/usr/bin/monit unmonitor bukkit; /sbin/start bukkit; sleep 20; /usr/bin/monit monitor bukkit'" with timeout 25 seconds

this was working in monit/5.5 but in monit/5.14, it only works sometimes. since monit/5.14 receives the unmonitor while it's starting the program, it waits for start to finish before actually doing the unmonitor which means the monitor fires too early and gets rejected.

Debunk answered 29/12, 2015 at 5:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.