HTTP GET on craigslist blocked

About

Asked 14/1, 2013 at 23:54 Answered 21/1, 2013 at 14:32

Solved ruby-on-rails amazon-ec2 craigslist

I'm trying to do a HTTP GET on craigslist sfbay.craigslist.org. Here is my (ruby) code which is really simple

require 'net/http'
result = Net::HTTP.get(URI.parse('http://sfbay.craigslist.org'))

I end up getting an error "This IP has been automatically blocked."

This behaviour only happens when I try this from Amazon EC2 or on heroku. When I try again on my own computer localhost I get the correct result. Does this have to do with Amazon EC2?

I'm wondering if other people have had the same issue. What can I do to access craigslist from EC2?

Durman answered 14/1, 2013 at 23:54 Comment(0)

I can confirm that Craigslist is blocking from the major Amazon EC2 IP ranges by IP (not by user agent). It works elsewhere, though I suspect any volume would cause other IPs to get blocked.

You could step around it with tor. More significantly, this stackoverflow question discusses data sources used by craigslist mashups.

I even tested a Brazil EC2, assuming they might not have all the CIDRs blocked. No bueno.

Throng answered 21/1, 2013 at 14:32 Comment(1)

Isn't this unethical? It is clearly not blocking Google or Bing or of course, Yahoo! – Booth 17/7, 2016 at 22:26

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags