AWS: dynamically allocate & associate new IP addresses to EC2 instance?
Asked Answered
P

2

10

I am running some web crawling jobs on an AWS hosted server. The crawler scrapes data from an eCommerce website but recently the crawler gets "timeout errors" from the website. The website might have limited my visiting frequency based on my IP address. Allocating a new Elastic-IP address solves the problem, but not for long.

My Question: Is there any service that I can use to automatically and dynamically allocate & associate new IPs to my instance? Thanks!

Pyxie answered 8/4, 2014 at 16:23 Comment(7)
did you considerer to use tor?Packston
@Gas Thanks! Does it work if I choose not to use Tor browser? My crawler(written in Java) fires HTTP requests directly to the target website, instead of invoking a real browser.Pyxie
docs.aws.amazon.com/cli/latest/reference/ec2/…Violetvioleta
@UriAgassi, thanks, I know I can allocate new Elastic-IP in the Admin Console or using a CLI tool. Is there a tool I can do this automatically? Or basically I need to write my own scripts? thxPyxie
you should write your own scriptsVioletvioleta
If the website is blocking you, they are doing so because they think you are causing problems for the site. Your best options are to get permission to crawl the site (preferably with API access), or change your strategy to crawl more slowly.Erma
@datasage, good point. API access was my initial thought but the target site doesn't have API provided for data collecting. Also, data needs to be quickly collected within one day or two by the end of every month, so a slow/gentle strategy doesn't work.. thxPyxie
S
7

To change the EIP you can just use Python boto

Something like this:

#!/usr/bin/python

import boto.ec2

conn = boto.ec2.connect_to_region("us-east-1",
    aws_access_key_id='<key>',
    aws_secret_access_key='<secret>')


reservations = ec2_conn.get_all_instances(filters={'instance-id' : 'i-xxxxxxxx'})
instance = reservations[0].instances[0]

old_address = instance.ip_address
new_address = conn.allocate_address().public_ip

conn.disassociate_address(old_address)
conn.associate_address('i-xxxxxxxx', new_address)
Switchback answered 8/4, 2014 at 19:47 Comment(0)
P
1

If you want use TOR network just execute:

sudo apt-get install tor 
sudo /etc/init.d/tor start

 netstat -ant | grep 9050 #  Tor port

and in your java project you set the proxy as:

public static void main(String[] args) {
    System.setProperty("socksProxyHost", "127.0.0.1");
    System.setProperty("socksProxyPort", "9050");

you can scheduler a cron job that each XX time reboot your application and tor.

Easy and secure.

Packston answered 8/4, 2014 at 20:22 Comment(1)
Great, thanks for this nice alternative. Using this approach sounds like I will need to modify my Java code and reboot Tor. Allocating new IP is easier for maintenance. I'll check this out if that doesn't work well.Pyxie

© 2022 - 2024 — McMap. All rights reserved.