mmonit golang restarting slow and status does not exist
Asked Answered
I

1

7

I created monit app that must restart golang site on crash

$ cd /etc/monit/conf.d 
$ vim checkSite 

It starting program with nohup and saving its pid to file:

check process site with pidfile /root/go/path/to/goSite/run.pid
    start program = "/bin/bash -c 'cd /root/go/path/to/goSitePath; nohup ./goSite > /dev/null 2>&1 & echo $! > run.pid'" with timeout 5 seconds
    stop program = "/bin/kill -9 `cat /root/go/path/to/goSitePath/run.pid`"

It starts ok.

Process 'site'
  status                            Running
  monitoring status                 Monitored
  pid                               29723
  parent pid                        1
  uptime                            2m 
  children                          0
  memory kilobytes                  8592
  memory kilobytes total            8592
  memory percent                    0.4%
  memory percent total              0.4%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Thu, 05 Mar 2015 07:20:32

Then to test how it will restart on crash I killed manually golang site.

Here I have two issues:

  1. Site is restarted rather slow: it takes 1 minute although in configuration I set with timeout 5 seconds
  2. Status of site in monit becomes Does not exist even after site in fact restarts. I guess this occurs because after killing and restarting site's pid is changing randomly, but how to overcome this I don't know.

status after restart:

Process 'site'
      status                            Does not exist
      monitoring status                 Monitored
      data collected                    Thu, 05 Mar 2015 08:04:44

How to reduce the time of restarting and how to repair site's monit status?

monit log:

[Mar  5 08:04:44] error    : 'site' process is not running
[Mar  5 08:04:44] info     : 'site' trying to restart
[Mar  5 08:04:44] info     : 'site' start: /bin/bash
[Mar  5 08:06:44] info     : 'site' process is running with pid 31479

Update

My golang site is rather simple:

package main

import (
    "fmt"
    "github.com/go-martini/martini"
)

func main() {
    m := martini.Classic()

    m.Get("/", func() {
        fmt.Println("main page")
    })

    m.Run()
}

Update 2

I tried to increase speed of monit reload my golang site by removing pid file itself. Say I made kill 29723 && rm run.pid and turned timer on to count time for site been accessible again. It took 85 seconds. So removing pid file did not help monit to increase speed of reloading site.

Industry answered 5/3, 2015 at 4:23 Comment(4)
It would help if you would show some code. In order to catch signals like sigint you'll need some code, see #11269443Bilinear
You also might try to write the PID via os.Getpid() from inside the go program.Bilinear
Try to remove the PID file from inside the go program inside a defer recover block and on interrupting signals and write a new PID file if the process (re)starts.Bilinear
@Bilinear please see the update and update 1 above. I printed site code and checked hypothesis about removing PID file. I removed it manually. Unfortunately, removing PID file did not help.Industry
L
5

monit doesn't have any subscription mechanism to inmediatelly discover if a process has died.

In daemon mode, as documented, monit works by periodically polling the status of all the configured rules, its poll-cycle is configured when daemon starts and defaults in some Linux distributions to 2 minutes, what means that in this case, monit can need till 2 minutes to take any action.

Check this configuration in your monitrc, it's configured with the set daemon directive, for example, if you want to check the status every 5 seconds, then you should set:

set daemon 5

On every cycle it updates its status, and executes actions if needed depending on this. So if it detects that the process doesn't exist, it will report Does not exist till the next poll cycle, even if it already takes the decission to restart it.

The timeout in the start daemon directive doesn't have anything to do with this poll-cycle, this is the time monit will give to the service to start. If the service doesn't start in this time monit will report it.

If monit doesn't meet your requirements, you can also try supervisord, that is always aware of the state of the executed programs.

Laundry answered 7/4, 2015 at 22:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.