How to bypass Cloudflare bot protection in selenium
Asked Answered
C

6

14

I need to grab some information from a site just for education purpose, however i cannot send requests because of the protection. I get The typical Checking-your-browser page shows up first and then i'm being redirected repeatedly. how i can bypass this protection in python selenium?

Cassilda answered 30/4, 2021 at 22:59 Comment(0)
Q
10

I had this problem a long time ago and I was able to solve it. Use the code below and enjoy :)

options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(options=options, executable_path=r"webdriver\chromedriver.exe")

///////// EDIT //////////////// this way now is not working !

Quadrennium answered 30/4, 2021 at 23:1 Comment(2)
Usually, it takes around 5 seconds for Cloudflare verification. You need to add a timer before you get page_source. Just add time.sleep(6) before driver.page_sourceKikuyu
This does not work.Joannajoanne
F
6

Use undetected_chromedriver pip package. This is very simple package for fake client.

import undetected_chromedriver


def init_webdriver():
    driver = undetected_chromedriver.Chrome()

    driver.get(url)

    content = driver.page_content

    driver.close()
    driver.quit()

Also you can run it in background

import undetected_chromedriver
from selenium import webdriver



def init_webdriver():
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    driver = undetected_chromedriver.Chrome(options)

    driver.get(url)

    content = driver.page_content

    driver.close()
    driver.quit()

I tested it 28 Jun 2022. Works very well.

Forint answered 28/6, 2022 at 12:0 Comment(1)
I got an error " RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.Usury
I
3

A bit of a late response and no wonder why developers still face this issue repetitively. I am using Java v17 and Gradle v7.4.2. My solution is related to the one explained above but the code is in Java.

@Before
public void setup() {
    WebDriverManager.chromedriver().setup();
    ChromeOptions options = new ChromeOptions();
    // Bypass Cloudflare checks
    options.setExperimentalOption("useAutomationExtension", false);
    options.addArguments("--disable-blink-features=AutomationControlled");
    driver = new ChromeDriver(options);
    driver.manage().window().maximize();
}

Please refer to the Selenium ChromeOptions documentation for further details.

Happy coding.

Illumine answered 26/4, 2022 at 22:58 Comment(1)
Thanks @spicy-strike this was making me crazy. This piece of code saved my life.Picador
C
2

As March 2022 :

Hi, I had the same problem when using headless Selenium on a Docker Linux image.

I solved it by creating a virtualdisplay right before calling the webdriver:

from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 800))  
display.start()

Don't forget to install both pyvirtualdisplay and xvfb: pip install pyvirtualdisplay and sudo apt-get install xvfb

And you must remove the "headless" option in ChromeDriver, here is the complete code I use :

    #Display in order to avoid CloudFare bot detection
    display = Display(visible=0, size=(800, 800))  
    display.start()
  
    options = webdriver.ChromeOptions()
    options.add_argument('--no-sandbox')
    options.add_argument('start-maximized')
    options.add_argument('enable-automation')
    options.add_argument('--disable-infobars')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-browser-side-navigation')
    options.add_argument("--remote-debugging-port=9222")
    # options.add_argument("--headless")
    options.add_argument('--disable-gpu')
    options.add_argument("--log-level=3")
    driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)

Since it was working nicely without headless on my local computer, I figured emulate a real display might do the work aswell. I do not really understand why, but from what I've understood, CloudFare tries to execute javascript code in order to confirm you're not a bot. Having a emulated webpage display helps to do so.

Comatose answered 16/3, 2022 at 17:4 Comment(1)
pyvirtualdisplay doesn't export Display now. So the import line should be: from pyvirtualdisplay.display import Display. Also visible should be set to False or you'll get a type error.Quirinus
P
1

I am not providing a new kind of answer, but just sharing my up to date experience for others reference. I know the space is dynamic, but as of now 26/11/2023 the below seems to work for me. I am trying to scrape a popular property website because I am looking for a property to rent. If I open with Edge driver I get the common CloudFlare bot detection message and a check box captcha, which keeps looping:

"site.com needs to review the security of your connection"

And if I open with undetected_chromedriver it runs without being detected:

from selenium import webdriver
import undetected_chromedriver as uc

my_options = webdriver.ChromeOptions()
my_options.add_argument( '--log-level=3' )
my_options.add_argument( '--no-sandbox' )
my_options.add_argument( '--disable-dev-shm-usage' )
my_options.add_argument( '--disable-blink-features=AutomationControlled' )
my_options.add_argument( 'user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36' )
my_options.add_argument( '--no-first-run' ) # this might be specific to undetected_chromedriver.v2 only
my_options.add_argument( '--no-service-autorun' ) # this might be specific to undetected_chromedriver.v2 only
my_options.add_argument( '--password-store=basic' ) # this might be specific to undetected_chromedriver.v2 only
#my_options.add_experimental_option( 'useAutomationExtension', False )
#my_options.add_experimental_option( 'excludeSwitches', ( 'enable-automation', ) )
my_options.add_argument( '--start-maximized' )
my_options.add_argument( '--blink-settings=imagesEnabled=false' )
my_options.headless = False
my_options.page_load_strategy = 'normal'

my_driver = uc.Chrome( options = my_options, version_main = 109 )
my_driver.get( 'about:blank' )
my_driver.execute_script( "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})" )
my_driver.get( ..............

Note: the two commented lines is not a type-o, it does work even when they are commented

Versioning: Windows 7x64, Python 3.8.10, Chrome x64 109.0.5414.120, undetected-chromedriver 3.5.4

Prey answered 26/11, 2023 at 15:40 Comment(0)
T
-2

SOLUTION JULY 2021

just add user agent argument in chrome options and set user agent to any value

ops = Options() ua='cat' ops.add_argument('--user-agent=%s' % ua) driver=uc.Chrome(executable_path=r"C:\chromedriver.exe",chrome_options=ops)

Turgite answered 15/7, 2021 at 8:6 Comment(1)
Not working with this either. I'm using random useragent from fake_useragent but not working.Junior

© 2022 - 2025 — McMap. All rights reserved.