Nagios: CRITICAL - Socket timeout after 10 seconds
Asked Answered
P

5

7

I've been running nagios for about two years, but recently this problem started appearing with one of my services.

I'm getting

CRITICAL - Socket timeout after 10 seconds

for a check_http -H my.host.com -f follow -u /abc/def check, which used to work fine. No other services are reporting this problem. The remote site is up and healthy, and I can do a wget http://my.host.com/abc/def from the nagios server, and it downloads the response just fine. Also, doing a check_http -H my.host.com -f follow works just fine, i.e. it's only when I use the -u argument that things break. I also tried passing it a different user agent string, no difference. I tried increasing the timeout, no luck. I tried with -v, but all it get is:

GET /abc/def HTTP/1.0
User-Agent: check_http/v1861 (nagios-plugins 1.4.11)
Connection: close
Host: my.host.com


CRITICAL - Socket timeout after 10 seconds

... which does not tell me what's going wrong.

Any ideas how I could resolve this?

Thanks!

Prolong answered 24/10, 2011 at 3:18 Comment(7)
Have you tried adding -4 or -6 to the check_http options? I've had this problem before where I had to force IPv4 for a check.Frisian
Thanks, I gave it a try. With -4 I get the same error. With -6 I get: Name or service not known HTTP CRITICAL - Unable to open TCP socketProlong
Can you post the output of your wget? I'm assuming since you are using follow that the target URL does a redirection.Frisian
The -f follow might not really be necessary in this case, I just have it part of the command I use for all my services, because some of them do redirect.Prolong
Here is the output from wget (with some obfuscation): --2011-11-16 23:04:34-- my.host.com/abc/def Resolving my.host.com... 174.xxx.yyy.zzz Connecting to my.host.com|174.xxx.yyy.zzz|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 6324686 (6.0M) [text/html] Saving to: def' 100%[==========================================================================================>] 6,324,686 5.97M/s in 1.0s 2011-11-16 23:04:36 (5.97 MB/s) - acr' saved [6324686/6324686]Prolong
How much time does resolving take? Could you please post the result of time host my.host.com there?Holton
real 0m0.377s user 0m0.000s sys 0m0.000sProlong
G
17

Try using the -N option of check_http.

I ran into similar problems, and in my case the web server didn't terminate the connection after sending the response (https was working, http wasn't). check_http tries to read from the open socket until the server closes the connection. If that doesn't happen then the timeout occurs.

The -N option tells check_http to receive only the header, but not the content of the page / document.

Gaven answered 20/12, 2011 at 13:18 Comment(4)
Thank you, finally my service is not in "PROBLEM" state anymore!Prolong
Cheers for the solution, however that connections are not terminated is a sign of a possible problem in the stack. Can OP comment on what was the change that triggered it, if known?Accrual
Had the same problem and it was due to an "optimising" network appliance.Circumambulate
Information for Check_MK users : in WATO this option is named "Don't wait for document body" - fixed the issue for me tooTriode
T
1

I tracked my issue down to an issue with the security providers configured in the most recent version of OpenSUSE.

From summary of other web pages it appears to be an issue with an attempt to use TLSv2 protocol which does not appear to work correctly, or is missing something in the default configurations to allow it to work.

To overcome the problem I commented out the security provider in question from the JRE security configuration file.

#security.provider.10=sun.security.pkcs11.SunPKCS11

The security.provider. value may be different in your configuration, but essentially the SunPKCS11 provider is at issue.

This configuration is normally found in

$JAVA_HOME/lib/security/java.security

of the JRE that you are using.

Theatrical answered 15/4, 2014 at 0:52 Comment(0)
H
0

Fixed with this url in nrpe.cfg: (on Deb 6.0 Squeeze using nagios-nrpe-server)

command[check_http]=/usr/lib/nagios/plugins/check_http -H localhost -p 8080 -N -u /login?from=%2F
Hindman answered 2/6, 2014 at 9:41 Comment(0)
D
0

For whoever is interested, I stumbled in this problem too and the problem ended up being in mod_itk on the web server.

A patch is available, even if it seems it's not included in the current CentOS or Debian packages:

https://lists.err.no/pipermail/mpm-itk/2015-September/000925.html

Daiseydaisi answered 2/3, 2017 at 17:53 Comment(0)
D
0

In my case /etc/postfix/main.cf file was not good configured. My mailserverrelay was not defined and was also very restrictive. I should to add:

relayhost = mailrelay.ext.example.com

smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination
Dierolf answered 22/6, 2021 at 10:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.