Selenium does not work with a chromedriver modified to avoid detection
Asked Answered
S

2

9

I'm asking this because I'm aware of this thread and this thread, and the others about the same subject, but the solution everyone forwards in the first thread no longer works. So please do not mark this as closed because that first thread exists. The answer is from 2016 and you can see more recent comments having trouble.

I'm using Selenium to do some light web scraping. One site I'm interacting with is clearly detecting that my browser is automated (but curiously, only cares as long as I'm also accessing a version of the site outside my region, but that's neither here nor there).

The solution in the first thread suggests taking the chromedriver downloaded from here and modifying it. It says to get rid of mentions of variables with "$cdc$ in them. So I do the following. Download v2.41 from that site, unzip it. This version lets me use Chrome with Selenium via br = webdriver.Chrome('./chromedriver'), but has the automation detection problem. So, I cp this to make chromedriver-modified.

In chromedriver-modified, I open it with vim and search for $cdc. I find a similar (but slightly different) function from the one in the linked thread on line 1934 or so:

function getPageCache(opt_doc, opt_w3c) {
  var doc = opt_doc || document;
  var w3c = opt_w3c || false;
  // var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'xxxx_asdjflasutopfhvcZLmcfl_';
  // var key = 'randomblahhh_';
  if (w3c) {
    if (!(key in doc))
      doc[key] = new CacheWithUUID();
    return doc[key];
  } else {
    if (!(key in doc))
      doc[key] = new Cache();
    return doc[key];
  }
}

I've tried replacing this variable both with something random (the randomblahhh_ var) and something that just replaces the first 4 characters of the $cdc one, because I saw both suggested in the comments in that thread (I don't know if some format for the variable is important here.

Neither works. What I mean by that is that when I try to run it with chromedriver-modified, the webdriver won't even start:

>>> from selenium import webdriver
>>> br = webdriver.Chrome(executable_path='./chromedriver-modified')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
    self.service.start()
  File "/usr/lib/python3/dist-packages/selenium/webdriver/common/service.py", line 96, in start
    self.assert_process_still_running()
  File "/usr/lib/python3/dist-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
    % (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service ./chromedriver-modified unexpectedly exited. Status code was: -11

I've had trouble Googling and figuring out what this status code means. In fact, I found this unanswered reddit thread with the same exact problem.

The first thread also mentions $wdc variables, but I find no mention of them in chromedriver.

Just to preempt the possible suggestion, too: I'm almost 100% confident that it's detecting that I'm using an automated browser because it's automated, not because of something like mouse click speed or anything. If I start the browser with selenium but then manually do the rest, it still causes the problem.

edit: I'm using Chrome v68 from the Ubuntu repos, google-chrome-stable. To be honest I don't need to use Chrome specifically, but the answers I've found seem to center around it rather than Firefox.

edit2: one last comment -- I noticed in the first linked thread that some people were "recompiling":

For me, I used chrome, so, all that I had to do was to ensure that $cdc_ didn't exist anymore as document variable, and voila (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

I'm not sure what that means -- are they recompiling Chrome itself? All I've done is change the variable in the chromedriver file.

Sherwood answered 9/8, 2018 at 15:16 Comment(6)
of course you need to build chromedriver after editing the source codeCeaseless
@CoreyGoldberg thanks for the reply. I'm still a little confused here -- if what I edited what the chromedrive source code, the when/where does it get built when I used it normally?Sherwood
@CoreyGoldberg I found this thread that might be the right thing, but it's pretty clear that OP is asking about building chromedriver while the answer is talking about building chromium.Sherwood
Hey @GrundleMoof, did you ever figure this out? I've only found once instance of $cdc myself, and no $wdc key. Perhaps the previous conversations were about older versions of chromedriver? I've also heard that once you are on a blacklist, you don't get off it - even if you are now doing everything correctly. Is it possible this is the wall you hit?Checker
I have changed $cdc but still website is detecting, I'm trying gosugamers.netBruno
Maybe rename the file name to chromedriver? Ik it sounds stupid but quote from the first thread you linked: "After altering the chromedriver binary, make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed."Faa
H
4

Removal

There are multiple methods available

1. Using the chrome-developer-protocol with Page.removeScriptToEvaluateOnNewDocument like in Selenium-Profiles

With Python

# driver already initialized here

driver.execute_cdp_cmd("Page.removeScriptToEvaluateOnNewDocument", {"identifier":"1"})

see docs for other programming-languages

2. Using the chrome-developer-protocol with Page.addScriptToEvaluateOnNewDocument using a script from undetected-chromedriver

With Python

# driver already initialized here

js = """
    let objectToInspect = window,
    result = [];
    while(objectToInspect !== null)
        { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
        objectToInspect = Object.getPrototypeOf(objectToInspect); }
    result.forEach(p => p.match(/.+_.+_(Array|Promise|Symbol)/ig)
        &&delete window[p]&&console.log('removed',p))
"""
driver.execute_cdp_cmd("Page.removeScriptToEvaluateOnNewDocument", {"source":js})

Note that this one might be detectable.

3. By patching the compiled chromedriver.

Meaning just remove all occurrences of cdc_* in the chromedriver-binary :)

For a Python script, see undetected_chromedriver/patcher.py

Details

The window.cdc_adoQpoasnfa76pfcZLmcfl_Xxxxx variables get added from chromedriver on every new page using the using the chrome-developer-protocol with Page.addScriptToEvaluateOnNewDocument with the script:

(function () {
    window.cdc_adoQpoasnfa76pfcZLmcfl_Array = window.Array;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Object = window.Object;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Promise = window.Promise;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Proxy = window.Proxy;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol = window.Symbol;
}) ();

Fortunately, it's always the first script ==> identifier="1". That's also, why 1.method works.

They seem to be hardcoded into chromedriver and used for making shure, that chromedriver still works in case javascript properties get overwritten code

Note: I am the developer of Selenium-Profiles

Hagler answered 18/3, 2023 at 15:37 Comment(0)
E
2

It is unnecessary to recompile it again unless you want to construct the individual chrome with certain features. Try to change the $cdc_asdjflasutopfhvcZLmcfl_ into $abc_asdjflasutopfhvcZLmcfl_.

Remember not to note this line or change it into other variable name which has different length. As the compiled file is sensitive to this, which may lead to the running error.

Extrovert answered 30/8, 2018 at 6:0 Comment(1)
This is correct. Keep the $ mark and don't change the length of the key, and you'll be fineCatenary

© 2022 - 2024 — McMap. All rights reserved.