I'm asking this because I'm aware of this thread and this thread, and the others about the same subject, but the solution everyone forwards in the first thread no longer works. So please do not mark this as closed because that first thread exists. The answer is from 2016 and you can see more recent comments having trouble.
I'm using Selenium to do some light web scraping. One site I'm interacting with is clearly detecting that my browser is automated (but curiously, only cares as long as I'm also accessing a version of the site outside my region, but that's neither here nor there).
The solution in the first thread suggests taking the chromedriver downloaded from here and modifying it. It says to get rid of mentions of variables with "$cdc$ in them. So I do the following. Download v2.41 from that site, unzip it. This version lets me use Chrome with Selenium via br = webdriver.Chrome('./chromedriver')
, but has the automation detection problem. So, I cp
this to make chromedriver-modified.
In chromedriver-modified, I open it with vim and search for $cdc. I find a similar (but slightly different) function from the one in the linked thread on line 1934 or so:
function getPageCache(opt_doc, opt_w3c) {
var doc = opt_doc || document;
var w3c = opt_w3c || false;
// var key = '$cdc_asdjflasutopfhvcZLmcfl_';
var key = 'xxxx_asdjflasutopfhvcZLmcfl_';
// var key = 'randomblahhh_';
if (w3c) {
if (!(key in doc))
doc[key] = new CacheWithUUID();
return doc[key];
} else {
if (!(key in doc))
doc[key] = new Cache();
return doc[key];
}
}
I've tried replacing this variable both with something random (the randomblahhh_
var) and something that just replaces the first 4 characters of the $cdc
one, because I saw both suggested in the comments in that thread (I don't know if some format for the variable is important here.
Neither works. What I mean by that is that when I try to run it with chromedriver-modified
, the webdriver won't even start:
>>> from selenium import webdriver
>>> br = webdriver.Chrome(executable_path='./chromedriver-modified')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
self.service.start()
File "/usr/lib/python3/dist-packages/selenium/webdriver/common/service.py", line 96, in start
self.assert_process_still_running()
File "/usr/lib/python3/dist-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service ./chromedriver-modified unexpectedly exited. Status code was: -11
I've had trouble Googling and figuring out what this status code means. In fact, I found this unanswered reddit thread with the same exact problem.
The first thread also mentions $wdc variables, but I find no mention of them in chromedriver.
Just to preempt the possible suggestion, too: I'm almost 100% confident that it's detecting that I'm using an automated browser because it's automated, not because of something like mouse click speed or anything. If I start the browser with selenium but then manually do the rest, it still causes the problem.
edit: I'm using Chrome v68 from the Ubuntu repos, google-chrome-stable. To be honest I don't need to use Chrome specifically, but the answers I've found seem to center around it rather than Firefox.
edit2: one last comment -- I noticed in the first linked thread that some people were "recompiling":
For me, I used chrome, so, all that I had to do was to ensure that $cdc_ didn't exist anymore as document variable, and voila (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)
I'm not sure what that means -- are they recompiling Chrome itself? All I've done is change the variable in the chromedriver file.