failed to add munin node to monitoring
Asked Answered
A

1

13

I'm trying to setup some new hosts in munin for monitoring. For some reason it ain't happening!

Here's what I've tried so far.

On the munin server, which is already monitoring several other hosts, I've added the host I want in /etc/munin/munin.conf

[db1]
    address   10.10.10.25 # <- obscured the real IP address 
    use_node_name yes

And on the db1 host I have this set in /etc/munin/munin-node.conf

host_name  db1.example.com
allow ^127\.0\.0\.1$
allow ^10\.10\.10\.26$
allow ^::1$
port 4949

And I made sure to restart the services on both machines.

From the monitoring host I can telnet to the new server I want to monitor on the munin port:

[root@monitor3:~] #telnet db1.example.com 4949
Trying 10.10.10.26...
Connected to db1.example.com.
Escape character is '^]'.
# munin node at db1.example.com

Wait a few minutes.. and nothing! The new server won't appear in the munin dashboard on the munin monitoring host.

In the /var/log/munin/munin-update.log log on the db1 host (the one I'm trying to monitor) I find this:

2015/11/30 03:20:02 [INFO] starting work in 14199 for db1/10.10.10.26:4949.

2015/11/30 03:20:02 [FATAL] Socket read from db1 failed.  Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254.

2015/11/30 03:20:02 [ERROR] Munin::Master::UpdateWorker<db1;db1> died with '[FATAL] Socket read from db1 failed.  Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254.

What could be going on here? And how can I solve this ?

Amabel answered 1/12, 2015 at 6:39 Comment(5)
Check if port is available?Illuminism
What about the node's logs? Do they say anything about it?Aulea
10.10.10.25 != 52.3.28.48Voroshilovgrad
john Smith, you caught me attempting to obfuscate the IPs. I just corrected the post so that it makes logical sense. Somnath Muluk - the ports are available on both hosts: monitor3: [root@monitor3:~] #lsof -i :4949 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME munin-nod 31800 root 5u IPv6 31820297 0t0 TCP *:munin (LISTEN) db1: [root@db1:~] #lsof -i :4949 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME munin-nod 14164 root 5u IPv6 26604748 0t0 TCP *:munin (LISTEN) muru the log I posted is from the db1 host that I am trying to monitor.Amabel
@Amabel that is very surprising. The log is what I would expect to see on a master (monitor3, in this case). Note how it says "starting work ... for node/ip:port". Indeed, munin-update.log would be on the master, not the node.Aulea
P
2

Since you have already verified that your network connection is ok, as a first step of investigation, I would surely simplify the munin-node.conf. Currently you have:

host_name  db1.example.com
allow ^127\.0\.0\.1$
allow ^10\.10\.10\.26$
allow ^::1$
port 4949

From these I would remove:

  • host_name (it is probably redundant.)
  • The IPv6 loopback address. (I don't think you need it, but you can add it back later if you do need it)
  • The IPv4 loopback address. (same as above)

If it still not working, you could completely outrule any issue with the allow config by replacing the direct IPs with:

cidr_allow 10.10.10.0/24

This would allow connection from a full range of IPs in case your db1 host appears to be connecting from a different IP.

Pleomorphism answered 16/2, 2016 at 18:49 Comment(15)
Hi, ok so I tried everything you mention except for cidr_allow. Since i know what IP my munin server is coming from. My config on db1 looks like this: [root@db1:/etc/munin] #egrep -v "^$|^#" munin-node.conf log_level 4 log_file /var/log/munin-node/munin-node.log pid_file /var/run/munin/munin-node.pid background 1 setsid 1 user root group root ignore_file [\#~]$ ignore_file DEADJOE$ ignore_file \.bak$ ignore_file %$ ignore_file \.dpkg-(tmp|new|old|dist)$ ignore_file \.rpm(save|new)$ ignore_file \.pod$ allow ^54\.174\.234\.136$ host * port 4949 And I restarted munin on both server and clientAmabel
Ok. A few things then: I would still try to use cidr_allow, just for debugging purposes. The allow setting relies on regexp. So there might be dragons. Also what is your munin version? And finally: you forgot to anonymize your IP in the previous comment.Pleomorphism
OK, thanks. I did try cidr_allow in the munin-node conf on db1. I tried first with the IP range of the munin server and then again with just cidr_allow 0.0.0.0/24. Tho I am not sure if that's allowed:Amabel
This is my munin-node conf on db1 on my last attempt: [root@db1:/etc/munin] #egrep -v "^$|^#" munin-node.conf ` log_level 4` ` log_file /var/log/munin-node/munin-node.log` pid_file /var/run/munin/munin-node.pid ` background 1` ` setsid 1` user root group root ignore_file [\#~]$ ` ignore_file DEADJOE$` ` ignore_file \.bak$` ignore_file %$ ` ignore_file \.dpkg-(tmp|new|old|dist)$` ignore_file \.rpm(save|new)$ ignore_file \.pod$ allow ^10\.10\.10\.26$ cidr_allow 0.0.0.0/24 host * port 4949Amabel
I reinstalled it on my machine, but I could not reproduce your error. So last guess: in your munin.conf you are referring to your host with a simple hostname (db1), but it identifies itself with FQDN (db1.example.com). That is something munin can be sensitive about. Could you change the munin.conf to use the FQDN as well?Pleomorphism
I tried changing the hostname in munin.conf on the server to the host's FQDN. However that didn't seem to have any effect. I think at this point the problem is with the server. I'm still seeing these lines in the munin-update log that I have in the OP:Amabel
2016/02/17 03:20:02 [INFO] starting work in 22254 for db1/10.10.10.25:4949. 2016/02/17 03:20:02 [FATAL] Socket read from db1 failed. Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254. 2016/02/17 03:20:02 [ERROR] Munin::Master::UpdateWorker<db1;db1> died with '[FATAL] Socket read from db1 failed. Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254. I think the answer must relate to that error. I'm just usure how to address it.Amabel
Socket read from db1 failed message suggest that your change to FQDN was not taken into account. It should read Socket read from db1.example.com if the change was properly applied.Pleomorphism
I think I was looking up too high in the logs on that last post. Next thing I noticed after the name change to the FQDN was this happening in the logs: pastebin.ca/3375467 I didn't see any errors in that output. But I still am not seeing the node turn up in munin.Amabel
Based on the logmessage you posted you do have proper connection to the node-server now. That is a good sign. Plugins are reporting warnings on some missing fields. If you are sure that you do not have the graphs prepared (check /var/cache/munin/www/index.html to be sure) then check munin-html.log please.Pleomorphism
Sorry guys. I got really tired of dealing with this issue. It seemed to me that the issue was on the server end, and not the client. So I tried stopping the problematic munin server. Spun up a new one on AWS. Installed munin again, and voila! The problem clients started showing up in the munin dashboard. Lame In know. But hey, it works! ;) Sorry guys. But the bounty stays with yours truly. I do appreciate your thought and input however. Not trying to be an asshole. But I solved the problem.Amabel
I bumped into an e-mail of yours sent in December, so it is fully understandable. :) I am still wondering what was the issue, but you got it working, that is what matters the most.Pleomorphism
Cool thanks Gergely. I appreciate you understanding. I am having a couple of other sticking points with munin that I may post about on stack overflow. Haven't gotten as much help from the munin list as I'd lilke. I guess maybe it's not that trafficked at this point?Amabel
Looks like its fame is slowly fading: google.com/trends/explore#q=munin ...Pleomorphism
yeah man, that's unfortunate. Munin is one of my favorite old standby's for RRD graphing. I'll keep using it despite it's lack of popularity!Amabel

© 2022 - 2024 — McMap. All rights reserved.