Selenium app redirect to Cloudflare page when hosted on Heroku
Asked Answered
F

3

11

I have made a discord bot that uses selenium to access a website and get information, when I run my code locally I don't have any problem but when I deploy to Heroku the first URL I get redirects me to the page Attention Required! | Cloudflare.

I have tried:

And many other with the same settings which I use:

options = Options()
options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
options.add_experimental_option("excludeSwitches", ["enable-logging", "enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
self.driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), options=options)
self.driver.execute_cdp_cmd('Network.setUserAgentOverride', {
    "userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})

but this does not work and the code runs only locally

PS: locally I'm on Windows

Source of the page I'm redirected to: https://gist.github.com/rafalou38/9ae95bd66e86d2171fc8a45cebd9720c page source

Fredericfrederica answered 15/11, 2020 at 9:25 Comment(3)
What's the content of the page you're redirected to? Could it be that it's Cloudflare's WAF challenging you to prove you're a human and not a bot?Renner
here: gist.github.com/rafalou38/9ae95bd66e86d2171fc8a45cebd9720cFredericfrederica
You may want to whitelist your Heroku machine's IP address in Cloudflare Page Rules so that it doesn't trigger Captcha checks. See this answer #50329349Renner
J
11

In case the Selenium driven ChromeDriver initiated Browsing Context is getting redirected to the page...

Attention Required! | Cloudflare...

... this implies that a Cloudflare program is blocking your program from accessing the AUT (Application under Test).


Analysis

There can be several reasons behind Cloudflare blocking the access as follows:

The access can be denied due to the following factors:

  • Cloudflare is trying to counter a possible Dictionary attack.
  • Your system IP is black listed by Cloudflare for mining Bit coins or Monero coins using your system.

In these cases eventually you are redirected to a captcha page.


Solution

In these cases the a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.

undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

  • Code Block:

    import undetected_chromedriver as uc
    from selenium import webdriver
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    driver = uc.Chrome(options=options)
    driver.get('https://bet365.com')
    

Alternate Solution

An alternate solution would be to whitelist your IP address through the Project Honey Pot website and you can find the end-to-end process detailed out in the video tittled Attention Required one more step captcha CloudFlare Error.

Julieannjulien answered 22/11, 2020 at 22:11 Comment(4)
Honey pot does not have any data on the IP: projecthoneypot.org/ip_3.80.128.77Fredericfrederica
@Fredericfrederica Checkout the updated answer and let me know the status.Julieannjulien
It is still not working, I first tried with exactly what you put and then with the arguments I had before but it did not help and I still get this Cloudflare pageFredericfrederica
I got the the following error when using driver = uc.Chrome(options=options): *** Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: <module>) ***Pedi
M
3

I used "undetected_chromedriver" and the following setup worked for me:

Used the buildpacks:

Added the config vars:

  • CHROMEDRIVER_PATH=/app/.chromedriver/bin/chromedriver
  • GOOGLE_CHROME_BIN=/app/.apt/usr/bin/google-chrome

Code snippet:

import undetected_chromedriver as uc
from selenium import webdriver
import os

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = uc.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), options=options)
Marrowfat answered 29/1, 2022 at 18:32 Comment(0)
A
2

I know it is not an actual solution, but sometimes Cloudflare blocks you by your location using your IP address. My code was working perfectly in my local server, but not in Heroku.

Turns out that the code was right using the solution provided by DebanjanB. The issue is that Heroku's server is running in a different country than mine. I confirmed this by asking a friend that lives in another country to try to get into the website with a phone. Cloudflare blocked my friend asking for a captcha.

I still haven't solve this. I'm not an expert and the workaround seems complicated. I guess a proxy could solve it??

I'll update if I get around it.

Atharvaveda answered 24/6, 2021 at 2:12 Comment(1)
Yes, I think a proxy or a VPN may be the only solution to get around this problem.Fredericfrederica

© 2022 - 2024 — McMap. All rights reserved.