get monit to alert first and restart later
Asked Answered
L

1

19

I would like to handle a kind of chain action in monit.

  • check for a process and alert immediately.
  • restart process after a num of cycles.

My tries (so far):

check process myprocess with pidfile /run/my.pid
  start program = "/path/to/binary start" with timeout 60 seconds
  stop program = "/path/to/binary stop" with timeout 60 seconds
  if not exist for 3 cycles then restart
  if not exist then alert
  if 3 restarts within 3 cycles then timeout

Does not alert and keeps in state "running" on failing PID but restarts after the 3 cycles.

check process myprocess with pidfile /run/my.pid
  start program = "/path/to/binary start" with timeout 60 seconds
  stop program = "/path/to/binary stop" with timeout 60 seconds
  if not exist for 3 cycles then restart
  if children < 1 for 1 cycles then alert
  if 3 restarts within 3 cycles then timeout

No alert of children < 1 but restart afer 5.

monit.log

[CEST Aug  1 15:09:30] error    : 'myprocess' process is not running

monit summary

Process 'myprocess'            Running

Here ist monit -v part:

Existence      = if does not exist 3 times within 3 cycle(s) then restart else 
                 if succeeded 1 times within 1 cycle(s) then alert
Pid            = if changed 1 times within 1 cycle(s) then alert
Ppid           = if changed 1 times within 1 cycle(s) then alert
Children       = if less than 1 1 times within 1 cycle(s) then alert else if 
                 succeeded 1 times within 1 cycle(s) then alert
Timeout        = If restarted 3 times within 3 cycle(s) then unmonitor

So the question: is it possible to send an alert and change the status to 'not running' within 1 cycle and restart after 3?

Lankford answered 1/8, 2014 at 13:17 Comment(2)
When you say "does not alert" do you mean you have setup global/local Email alerts for Monit and it does not send them accordingly?Ingulf
monit is setup correctly. all alerts and emails are fine. As I wrote, monit does not alert at all and keeps in state "running" on failing PID but restarts after the 3 cycles.Lankford
I
17

EDIT (IMPORTANT): See comments below for newer (as per Feb. 2019) versions of Monit, where this behaviour has been improved.


This line:

if does not exist for 3 cycles then restart

Means the following:

Do not perform any action until you have checked 3 times that the service does not exist, then restart it. This behaviour is described in monit's documentation as Failure Tolerance:

FAILURE TOLERANCE

By default the action is executed if it matches and the service set in an error state. However, you can require a test to fail more than once before the error event is triggered and the service state changed to failed. This is useful to avoid getting alerts on spurious errors, which can happen, especially with network tests.

Syntax:

FOR CYCLES ... or:

[TIMES WITHIN] CYCLES ...

Accordingly, Monit wont change the service's status until it fails within the next X cycles. In order to confirm this statement, just remove the fault tolerance for this service and use only:

if does not exist then alert

stop manually the service and confirm that the command

monit status

shows now the status "Does not exist" as soon as you stop it.

So, back to your questions:

  1. Yes, it is possible to send an alert (per Email) within 1 cycle. For that, you need to define the option "if does not exist then alert" for that service and setup Email-Alerts correctly. Assuming you would like to use an external eMail-Server, you need to define at least two lines (Configuration Example with gmail):

SMTP SERVER CONFIGURATION

set mailserver smtp.gmail.com PORT 587 USERNAME "[email protected]" PASSWORD "xxxxx" using TLSV1 with timeout 30 seconds

(Be aware that in gmail you must activate the access for "unsecure" apps in order to allow monit to use the stmp service)

and

EMAIL RECIPIENT

set alert [email protected]

both in the file /etc/monit/monitrc. Refer to the official documentation for more information about these two lines.

  1. As far as the documentation tells, it is not possible to update the status of the service inmediately if a fault tolerance (perform action after X cycles) is defined. But you can still define alerts to be sent inmediately and restart the service within the desired cycles.

References:

Monit's documentation: https://mmonit.com/monit/documentation/monit.html

Hope it helps!

Regards

Inextricable answered 27/10, 2015 at 11:26 Comment(4)
Could someone explain me why is people down voting this answer? o_OIngulf
Please have a look at the first configuration. There I tried "if not exist then alert" without effect.Lankford
I just did some tests with what appears to be the latest version of monit (5.25.2) and it looks like they got rid of the failure tolerance. As far as I can test, the status of the process changes now immediately after the first cycle, you can define "if does not exist for 3 cycles then restart" and "if does not exist then alert" together and it does actually work like intended (alerts from the first cycle and restarts at the third). I'd suggest updating to the latest version and see if your config works. If it does, I'd update the answer.Ingulf
Nah, they didn't get rid of the failure tolerance. It's now called "Fault Tolerance" instead and looks pretty much like before, but in any case the behavior is now different and the status of the process does get changed after the first error cycle rather than at the end of the specified cycles, so it really should work as you expected. Again, try to update and share :-)Ingulf

© 2022 - 2024 — McMap. All rights reserved.