Why am I getting intermittent Excon::Error::Socket: getaddrinfo: No address associated with hostname (SocketError)?

Asked 2/8, 2016 at 18:37 Answered 22/2, 2017 at 7:27

Rails 4 - Ruby 2.2.2 - Amazon AWS S3 - dragonfly 1.0.12 - dragonfly-s3_data_store 1.2 - fog-aws 0.10.0

Around 99% of the time we have no issues. The issue usually only happens during times when usage is high but I noticed it happen when there were almost no users as well. The line that throws the error:

 # excon/lib/excon/socket.rb
 # line 100 inside the connection method.
 addrinfo = ::Socket.getaddrinfo(*args)

The error happens everywhere in the application. ~~Sometimes the error is seen when there is not a remote connection.~~ - I am no longer able to verify this.

I used Rails loggers to capture the arguments being passed in and there is seemingly no difference between a pass and a fail. Here are some examples:

 # PASS
 ["s3.amazonaws.com", 443, 0, 1, nil, nil, false]
 ["mybucket.s3.amazonaws.com", 443, 0, 1, nil, nil, false]

 # FAIL
 ["mybucket.s3-us-west-1.amazonaws.com", 443, 0, 1, nil, nil, false]

I came across several forums that lead me to believe an update was needed to the excon gem. I upgraded the Excon gem from 0.45.4 to 0.51.0. In addition to that I also updated the Fog gem from 1.36.0 to 1.38.0.

After upgrading the error went from "getaddrinfo: Name or service not known (SocketError)" to "Excon::Error::Socket: getaddrinfo: No address associated with hostname (SocketError)"

The url captured for a failed response is different than one of the urls that passes. I will look in to this further.

UPDATE:

The dragonfly initializer specifies the same path as the one that fails and because url_host overrides the default functionality I decided to remove it.

 # myapp/config/initializers/dragonfly.rb
 ...
 url_host: 'mybucket.s3-us-west-1.amazonaws.com'

This resulted in no change. The same url is still used and is the only one that fails.

Sadness answered 2/8, 2016 at 18:37 Comment(11)

Could you share some of the pass/fail arguments for reference? Thanks. – Unbosom 3/8, 2016 at 17:34

The loggers were taken out when we upgraded the gem. I will add successful args now but I will not be able to provide a failed args list until tomorrow. – Sadness 3/8, 2016 at 18:54

It would seem as if the url is getting "-us-west-1" appended to it. This may be the cause of my woes. – Sadness 4/8, 2016 at 15:35

Hmm. It might be failed redirect following (which occurs when the connection and bucket are in different regions). Some of that can be a bit wonky at times. – Unbosom 4/8, 2016 at 19:52

@Unbosom do you have any advice for dealing with and/or debugging this issue? – Sadness 15/8, 2016 at 14:47

If you have only that URL failing, maybe you have your region written wrong somewhere, like here? Or if it fails from time to time, then I'd say that it's an excon problem since getaddrinfo() is about DNS and it might fail for various reasons (so maybe there is a need to retry). – Inveigh 15/8, 2016 at 18:55

How frequently is it failing? If it is a long running process, it might also be a caching issue? (ie DNS is correct on initial connection, but changes later). I'm not sure that would be very likely, but perhaps. I suppose it could also signal a broader networking error, but I would imagine that would show up more dramatically (and less regularly). Is it possible that some of the objects would be in a different region? This might also lead to issues. – Unbosom 16/8, 2016 at 19:0

We encounter this error fairly infrequently for the amount of things that go through excon. I would estimate we see the error 3 - 20 times a day. We have 1000s of users. We only have one bucket. I will contact Amazon and get more information about how our bucket is hosted. – Sadness 16/8, 2016 at 22:12

Yeah, afraid I haven't heard of other cases like this so I don't readily have other advise. On some level, networks are not to be trusted, so perhaps retries will be sufficient. Still, I would expect this to be hit more broadly if it were a general issue with S3 (and it is not being hit broadly to the best of my knowledge). – Unbosom 25/8, 2016 at 18:49

Is your application running on an EC2 instance? What operating system? If Linux, what's the output of sysctl net.ipv4.ip_local_port_range? – Plataea 6/9, 2016 at 7:39

its s3. net.ipv4.ip_local_port_range = 32768 61000 – Sadness 28/9, 2016 at 14:36

I had this error, too. In my case, the culprit was either the server load (a slow file upload) or it was special characters in the filename. Since you also see this during low usage times, you might want to look at the filenames that people upload. For me the error typically occurred, when someone uploaded a file with German umlauts (ä,ö,ü,ß) in the name of the file.

So please try to upload a file with some special character in the name and tell us whether this reproduces the error faithfully.

If this is the case, then simply escape the special characters or name the file differently. Here is a description of the special characters issue: https://github.com/markevans/dragonfly-s3_data_store/issues/6.

Elna answered 30/12, 2016 at 18:37 Comment(2)

We are seeing the error on simple get requests as well as file upload/download. Sometimes the error is simply: SocketError: getaddrinfo: Name or service not known without the "Exconn" portion as stated above. – Sadness 5/1, 2017 at 14:32

Ok, if you're getting the error on simply GET requests, my explanation does not fit your case. I have no idea beyond what I described from my own experience above. Sorry :( – Elna 7/1, 2017 at 13:35

It might not solve your problem but i have seen something like this in two cases-

Firewall restricted the port my system was configured to.
My authorization/authentication credentials were wrong/outdated.

Botanist answered 22/2, 2017 at 7:27 Comment(0)

Recommended topics

Hot tags