R Change IP Address programmatically
Asked Answered
S

1

8

Currently changing user_agent by passing different strings to the html_session() method.

Is there also a way to change your IP address on a timer when scraping a website?

Supernaturalism answered 4/1, 2017 at 14:29 Comment(4)
this sounds an awful lot like a method for circumventing terms of use of a website ...Joelynn
Take a look here: google-scraper.squabbel.com This is dedicated to Google scraping but will help for your question as well as by using the information for anything. It applies to almost any website, most are easier than Google.Glee
you can use tor and privoxy or direct tor for this purpose. Note:- I personally believe there is nothing unethical in circumventing website restriction. Obviously you should not take advantage of the process and make unnecessarily numerous hits to the target webpage.Cineraria
Thank you guys. Do you know of a good guide for that @IndranilGayen using R? Failing that could always use Python.Supernaturalism
F
7

You can use a proxy (which changes your ip) via use_proxy as follows:

html_session("you-url", use_proxy("proxy-ip", port))

For more details see: ?httr::use_proxy

To check if it is working you can do the following:

require(httr)

content(GET("https://ifconfig.co/json"), "parsed")
content(GET("https://ifconfig.co/json", use_proxy("138.201.63.123", 31288)), "parsed")

The first call will return your IP. The second call should return 138.201.63.123 as ip.

This Proxy was taken from http://proxylist.hidemyass.com/ - no garantees for anything...

Favata answered 4/1, 2017 at 14:41 Comment(8)
Thank you. Are there any restrictions on the IP address or port number that can be used?Supernaturalism
@Supernaturalism What would be such a restriction?Semantics
@Supernaturalism it has to be a valid URL of a proxy server. If you want to use a socks-proxy use something like use_proxy("socks://127.0.0.1", 9050)Favata
So for instance it could be any currently valid entry on the socks-proxy.net website?Supernaturalism
Thank you. Sending one request gets me a robot check. Do you know how to view the information sent in the request?Supernaturalism
Habe a Look at ?verboseFavata
I have tried the sollution and it dose not work for me. html_session("https://www.maxmodels.pl", use_proxy("95.171.198.206", 8080)) generated the error Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached: Connection timed out after 10000 millisecondsMailer
Are you sure the proxy is working correctly? Oftentimes proxy's from the web are outdated/blocked by the other site. Did it work using the proxy eg via curl in the Shell?Favata

© 2022 - 2024 — McMap. All rights reserved.