Specs
Here's some background info on the system I'm running:
Ubuntu v 14.04
Node v4.4.0
Node
request
module v2.69.0
All of this on a DigitalOcean droplet/server on a New York-based center.
Problem Description
So I run the following js file:
var request = require('request');
var url = 'http://www.supremenewyork.com/';
request(url, function(err, res, body) {
if (err) {
console.log(err);
return;
}
console.log('body:', body);
});
On my droplet. Roughly 70-80% of the time I try this, Now every single time I try this, I'll get an ETIMEDOUT
error like so:
{ [Error: connect ETIMEDOUT 52.6.25.180:80]
code: 'ETIMEDOUT',
errno: 'ETIMEDOUT',
syscall: 'connect',
address: '52.6.25.180',
port: 80 }
Of note, the errors seem to come in 'waves'. That is, I'll manage to get a handful of requests through for a certain period of time, followed by a string of ETIMEDOUT
errors. Errors happen more often than I am able to get my requests through by a ratio of approximately 3:1 errors to successes.
On my own computer (Mac running OS X El Capitan), running the js file for the given site works with 100% success (i.e. I've never run into this problem before)... so I'm not sure why the problem is contained to my droplet.
Any pointers would be appreciated.
Research/Similar Posts:
Node.js 0.4.10. http get( ) request " ETIMEDOUT Connection timed out " frequently
Why can't I ping herokuapp <-- starting to get a better picture of what's going on here...
Problem with http GET request on node js <-- seemed helpful at first (later realized setting User-Agent probably does nothing significant)
Additional Info
I also feel that it's worth mentioning the site I'm trying to make requests at actively has a problem with scripts and web scrapers, so I wouldn't be surprised if they tried everything in the book to prevent this from taking place.
Possible Causes
IP address blocking -->
not the case (yet) as I will still occasionally get responses from the serverI am no longer able to get any sort of response from the server. This might be the cause, but I am really confused at how they might be doing this. No issues on my local machine, no issues requesting their page from a browser on my droplet, but then this.'Rate-limiting' of my requests --> if this is somehow the case, I would like to know why this is happening specifically on my server and not, say, on my local machine
The manner in which I'm making my requests (i.e. not through a browser). --> I don't think this is the case because I can run the first script with a 100% response rate on my local computer (unless there is something my local computer does before sending my request to their server).
The system itself. I've only tested the first script on my Mac. Perhaps the code runs differently on different OS's/systems..?
Diagnosing with traceroute
So as per @ RabeeAbdelWahab's suggestion, I attempted to diagnose the problem with traceroute. However, I have practically no knowledge of networks so I'm not sure how to proceed. Here's an example output:
traceroute to <> (XXX.XXX.XXX.XXX), 30 hops max, 60 byte packets
1 45.55.192.254 (45.55.192.254) 8.903 ms 8.879 ms 8.865 ms
2 162.243.188.229 (162.243.188.229) 1.028 ms 162.243.188.233 (162.243.188.233) 0.986 ms 1.004 ms
3 xe-0-9-0-17.r08.nycmny01.us.bb.gin.ntt.net (129.250.204.113) 1.923 ms 1.918 ms nyk-b3-link.telia.net (62.115.45.5) 1.587 ms
4 ae-11.amazon.nycmny01.us.bb.gin.ntt.net (129.250.201.138) 1.935 ms ae-10.amazon.nycmny01.us.bb.gin.ntt.net (129.250.201.134) 1.586 ms *
5 nyk-b5-link.telia.net (213.155.131.137) 1.822 ms * *
6 * * 62.115.32.130 (62.115.32.130) 1.361 ms
7 * * *
8 * * *
9 * * *
10 54.239.110.157 (54.239.110.157) 33.817 ms * 54.239.110.133 (54.239.110.133) 27.683 ms
11 54.239.111.17 (54.239.111.17) 8.193 ms 205.251.244.128 (205.251.244.128) 7.883 ms 54.239.111.23 (54.239.111.23) 9.319 ms
12 205.251.245.55 (205.251.245.55) 8.253 ms 54.239.110.175 (54.239.110.175) 24.601 ms 205.251.244.195 (205.251.244.195) 8.250 ms
13 * 54.239.111.27 (54.239.111.27) 9.319 ms 54.239.111.29 (54.239.111.29) 9.290 ms
14 * * *
15 54.239.111.23 (54.239.111.23) 9.136 ms * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
So after running traceroute
several more times, I notice the following patterns:
The "***" outputs begin at some point on or slightly after the 15th hop.
The last IP Address before the "* * *" hops mostly seems to alternate between the same to addresses:
205.251.XXX.XXX
(slightly more often the case) or54.239.XXX.XXX
. In a few select instances I'll get an address like72.21.222.155
.
In addition, I have seen no differences when:
Running
traceroute
with the-m 255
option (i.e. max number of hops).Running
traceroute
with the-I
option.Running
traceroute
with the-e
option.Running
traceroute
with the-p 80
or-p 25
options.Running
traceroute
on a different droplet located in the same data center as the droplet in question.
Diagnosing with ping
Using ping
, here's a running list of sites I can and cannot connect to:
Can connect
google.com
facebook.com
reddit.com
github.com
stackoverflow.com
youtube.com
twitter.com
Can't connect:
amazon.com
microsoft.com
apple.com
walmart.com
paypal.com
cnn.com
nyt.org
wolframalpha.com
Observations: Is there a reason why I seem to be able to connect to sites that have 'social' features (and otherwise not)?
Apparently, it's common for sites not to return replies by ICMP (which is what
ping
,traceroute
uses). Please disregard the above...
Additional findings
So I've noticed that if I modify my request to take an additional 'User-Agent' header (code example provided below), I'm able to initially get back the html response.
var request = require('request');
var requestOptions =
{
url: 'http://www.supremenewyork.com/some/route',
headers: {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
};
request(requestOptions, function(err, res, body) {
if (err) {
console.log(err);
return;
}
console.log('body:', body);
});
I'm actually able to get back a response using the above method a few times. Afterwards, it seems all my connections lead to the aforementioned ETIMEDOUT error. Then I'll have to wait some lengthy period of time and it's rinse, wash, and repeat.
I actually performed a simple two-tailed proportional test for the above (i.e. receiving a response with and without a 'User-Agent' header) and got a p-value of 0.8493... so no statistical significance between the two. Again, please disregard the aforementioned...
request
package? – Schumerhttp.get
), but to no avail. Would there be a difference between what I tried and what you suggested? – Rube