Can a website detect when you are using Selenium with chromedriver?
Asked Answered
M

26

628

I've been testing out Selenium with Chromedriver and I noticed that some pages can detect that you're using Selenium even though there's no automation at all. Even when I'm just browsing manually just using Chrome through Selenium and Xephyr I often get a page saying that suspicious activity was detected. I've checked my user agent, and my browser fingerprint, and they are all exactly identical to the normal Chrome browser.

When I browse to these sites in normal Chrome everything works fine, but the moment I use Selenium I'm detected.

In theory, chromedriver and Chrome should look literally exactly the same to any web server, but somehow they can detect it.

If you want some test code try out this:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

If you browse around stubhub you'll get redirected and 'blocked' within one or two requests. I've been investigating this and I can't figure out how they can tell that a user is using Selenium.

How do they do it?

I installed the Selenium IDE plugin in Firefox and I got banned when I went to stubhub.com in the normal Firefox browser with only the additional plugin.

When I use Fiddler to view the HTTP requests being sent back and forth I've noticed that the 'fake browser's' requests often have 'no-cache' in the response header.

Results like this Is there a way to detect that I'm in a Selenium Webdriver page from JavaScript? suggest that there should be no way to detect when you are using a webdriver. But this evidence suggests otherwise.

The site uploads a fingerprint to their servers, but I checked and the fingerprint of Selenium is identical to the fingerprint when using Chrome.

This is one of the fingerprint payloads that they send to their servers:

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-
US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":
{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionMo
dule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":
{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-
flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContent
DecryptionModuleapplication/x-ppapi-widevine-
cdm","4":"NativeClientExecutableapplication/x-
nacl","5":"PortableNativeClientExecutableapplication/x-
pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-
pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":
{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"Trebu
chetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationM
ono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

It's identical in Selenium and in Chrome.

VPNs work for a single use, but they get detected after I load the first page. Clearly some JavaScript code is being run to detect Selenium.

Magavern answered 20/10, 2015 at 0:8 Comment(19)
I would suggest using a local proxy to take a look at the web traffic going from your request to the server and back. You should be able to tell from there. fiddler, burp, ZED proxy any of them will do the trick. I prefer BurpSchizogenesis
I did this and there's no difference between the requests at all. How are they possibly doing this?Magavern
@RyanWeinstein: It is not traffic. My guess is that Selenium needs to expose some JavaScript hooks which can be detected on the client-side JavaScript.Augmentation
Or if it is traffic then it is a traffic pattern.... you are browsing pages too fast.Augmentation
I'm not browsing too fast. I only load a single page and I navigate through it normally using my mouse and keyboard. Also it doesn't make sense that Selenium needs to expose hooks, because its literally running chrome.exe. It just runs normal chrome and allows you to get data from it. Any other ideas? I was thinking maybe it has something to do with cookies. This is driving me crazy.Magavern
Mikko's idea sounds pretty plausible to me. Selenium needs to be able to detect events so that it can respond to them. So it must be injecting some javascript code which is detected even when you're not actually automating anything.Ecstatics
Its possible, but if that's the case then it would go against everything that anyone knows about Selenium. There is nowhere online where anyone knows a way to detect if a user is using chromedriver. So if there's a way to detect it using frontend javascript then I want to know what it is. #3614972Magavern
Can you post the request headers (except the sensitive info, of course)? That's one... There's also probably some user interaction that the site is expecting which is not happening.Briarroot
This site uses distill bot detection technology and delivers content using akamaitechnologies.com CDN from diffrent ips e.g. 95.100.59.245 , 104.70.243.66 , 23.202.161.241Tho
Same happens to 411.comTho
I am able to hit stubhub.com and 411.com with chromedriver/selenium and your listed settings. Reading on distil from SIslam it appears as though they are keeping a fingerprint in memory, which is probably triggered based on behaviour.Joannejoannes
The fingerprint contains some browser information mostly. You say you are able to go to stubhub and navigate around in chromedriver? Are you sure you're using exactly my settings? I can't navigate around using chromedriver so there must be some difference. @JoannejoannesMagavern
I am experiencing the same issue with Selenium and the firefox driver. The interesting thing to note is I am running Selenium in a VMWare Workstation Virtual Machine that is accessing the internet through a NAT. The host machine is able to access stubhub, while the VM is unable to access when using Selenium, or even the browser instance Selenium launched. I had the VM Browser instance Blocked and stubhub still recognizes the machine and has it blocked. So it must be performing a fingerprint of the browser and machine in some manner.Anselmo
@RyanWeinstein Yes fp is generated from about 40 variables that encompasses client and network info! What i read in distill!Tho
Without using pyvirtualdisplay, I can connect to stubhub.com and browse around just fine; I don't get blockedSeptuagenarian
@cwa: Can you post your code? Noone else here seems to have gotten it to work.Rn
Did you try to connect via remote debugging instead of starting a new instance ? sites.google.com/a/chromium.org/chromedriver/help/… The webdriver will be invisible for the website.Novick
Case of me it is working. I can open stubhub.com with Selenium and navigate items later.Ridgley
Relevant Q&A here with working solutions: #53040051Coffelt
E
186

Replacing cdc_ string

You can use Vim or Perl to replace the cdc_ string in chromedriver. See the answer by @Erti-Chris Eelmaa to learn more about that string and how it's a detection point.

Using Vim or Perl prevents you from having to recompile source code or use a hex editor.

Make sure to make a copy of the original chromedriver before attempting to edit it.

Our goal is to alter the cdc_ string, which looks something like $cdc_lasutopfhvcZLmcfl.

The methods below were tested on chromedriver version 2.41.578706.


Using Vim

vim -b /path/to/chromedriver

After running the line above, you'll probably see a bunch of gibberish. Do the following:

  1. Replace all instances of cdc_ with dog_ by typing :%s/cdc_/dog_/g.
    • dog_ is just an example. You can choose anything as long as it has the same amount of characters as the search string (e.g., cdc_), otherwise the chromedriver will fail.
  2. To save the changes and quit, type :wq! and press return.
    • If you need to quit without saving changes, type :q! and press return.

The -b option tells vim upfront to open the file as a binary, so it won't mess with things like (missing) line endings (especially at the end of the file).


Using Perl

The line below replaces all cdc_ occurrences with dog_. Credit to Vic Seedoubleyew:

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

Make sure that the replacement string (e.g., dog_) has the same number of characters as the search string (e.g., cdc_), otherwise the chromedriver will fail.


Wrapping Up

To verify that all occurrences of cdc_ were replaced:

grep "cdc_" /path/to/chromedriver

If no output was returned, the replacement was successful.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don't see killed in the output, you've successfully altered the driver.

Make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed.


My Experience With This Method

I was previously being detected on a website while trying to log in, but after replacing cdc_ with an equal sized string, I was able to log in. Like others have said though, if you've already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, etc.

Enlistee answered 31/8, 2018 at 3:49 Comment(25)
@LekaBaper Thanks for the heads up. The chromedriver version that I used was version 2.41.578706.Enlistee
Did not worked even when I used this chromedriver.exe modification on new physical computer on different network.Amaris
it's give an error says, this version cannot work in this computer :(Playful
@Enlistee Is there any undetectable open-source fork which you are aware of ?Malkamalkah
@Malkamalkah No, not that I’m aware of.Enlistee
Note that the chromedriver people have declared this issue won't-fix, so you can expect to have to use a fork or edit the binary for the indefinite future. bugs.chromium.org/p/chromedriver/issues/detail?id=3220Bala
hi, @colossatr0n. I've tried with your method and also change my ip, but it still not worked as to this site: authstl.alipay.com/login/index.htm, is there any new magic that i could use.Monotint
TLDR; open binary in hex editor, change string starting with $cdc to some other string of same length, then save and run modified binary.Triphylite
I have seven instances of cdc_ in my Chromedriver. Do I need to replace all or just one? Your Vim/perl methods differs on this point.Shrievalty
@Shrievalty Replace all of them. I'll update the Vim instructions.Enlistee
After modification, I cannot open the chrome driver. Mac shouted that the original file has been altered that's why for security reasons, it did not start.Paredes
for weeks now, I've been finding solutions on why chromedriver keeps giving me 403 error and such. But this where I found my heaven. Thanks for this solution on just editing the chromdriver source and everything works for me.Photima
sed -b "s/cdc_/dog_/g" chromedriver.exe > chromedriver_patched.exe For those with GNU sed on Windows.Inodorous
this breaks my chromedriver.. once I do perl -pi -e 's/cdc_/dog_/g' it breaks the chromedriver and I must restore it using perl -pi -e 's/dog_/cdc_/g' to make it running again.. I tried doing perl -pi -e 's/$cdc_/$dog_/g' but I'm not sure what the effect of it..Flexor
what about geckodriver?Sulphuric
This was THE ONLY (and I mean, really, literally, the only) method that worked for me. Thank you so much.Laspisa
@JugertMucoimaj No, this method is specific for Chrome. GeckoDriver is a completely different binary.Laspisa
@Laspisa you know sum about geckodriverSulphuric
when i use vim in Mac, it shows: E486: Pattern not found: cdc_Smallage
Note that modified chromedriver won't run on M1 Mac. Here's a workaround: https://mcmap.net/q/65248/-chromedriver-killed-on-apple-silicon-when-cdc_-modifiedElectrograph
My Chromedriver doesn't have the cdc stringComber
Have an example video to do that? I am new, I don't understand to do that.Sheepdip
Seems like a lot of manual work, but still it wouldn't cover all eventualities according to other posts in this thread. Wouldn't it be more effective and time-efficient to employ a full-blown package such as undetected-chromedriver or selenium-stealth?Akkadian
Wont work for sites using datadom datadome.co/bot-management-protection/…Intranuclear
Just as a reminder for beginners like me: 1. Remember to wrap the path in double quotes (") in both vim or pearl when modifying, with the administrator mode on. 2. cmd does not have an original command like grep, use findstr instead.Tartrazine
E
263

Basically, the way the Selenium detection works, is that they test for predefined JavaScript variables which appear when running with Selenium. The bot detection scripts usually look anything containing word "selenium" / "webdriver" in any of the variables (on window object), and also document variables called $cdc_ and $wdc_. Of course, all of this depends on which browser you are on. All the different browsers expose different things.

For me, I used Chrome, so, all that I had to do was to ensure that $cdc_ didn't exist anymore as a document variable, and voilà (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

This is the function I modified in chromedriver:

File call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(Note the comment. All I did I turned $cdc_ to randomblabla_.)

Here is pseudocode which demonstrates some of the techniques that bot networks might use:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

According to answer, there are multiple methods to remove them. One of them is simply opening chromedriver.exe with a HEX-editor and removing all occurences of $cdc_

Esplanade answered 19/12, 2016 at 10:14 Comment(31)
This is interesting, you tested it and it worked even across multiple requests?Magavern
yes it worked without probs, note one problem is if you fell into the "blacklist" BEFORE this change, it's quite hard to get out. if you want to get out of the existing black list, you need to implement fake canvas fingerprinting, disable flash, change IP, and change request header order (swap language and Accept headers). Once you fell into the blacklist, they have very good measures to track you, even if you change IP, even if you open chrome in incognito, etcEsplanade
Just tested and it worked. Be prepared that at least one hour is needed though, to clone chromium repository and build it...Woke
Makalele: I clarified: change the $cdc_ part to something completely random, eg change it to 'MAKALELE_'Esplanade
@Arya chromium.googlesource.com/chromium/src/+/master/docs/…Molotov
@Molotov Thanks. After I follow the instructions (until "fetch chromium") in the url, I can't find "/js" or "/js/call_function.js" file from "chromium/src/" to remove such "$cdc_". Do you know where it is located?Plenty
I found the file "/Users/your_username/chromium/src/chrome/test/chromedriver/js"Plenty
In my case the webdriver is apparently still detected, even though this $cdc_ variable had its name changed. Any other suggestions on what can be done to make Chrome look "100% natural"?Layne
@DennisThrysøe did you see the later part of my post?Esplanade
@Erti-ChrisEelmaa: Yes, none of these seem to be present.Layne
compile older version of Chromium. See when my post was written and take chrome version that was about then roughly. A lot can change in a year.Esplanade
There is a simple, and automatable way to do that, which is to replace cdc_ with a 4 character other string in the chromedriver binary. The way to do that is to use a similar perl command : perl -pi -e 's/cdc_/abc_/g' /path/to/chromedriver. As soon as someone else confirms, feel free to edit the original answerBarbarese
I also find that there is only "cdc_" to replace on Mac and Linux 64, and on Linux 32 one should replace both "wdc_" and "selenium". Again as soon as someone confirms I'll add that to the postBarbarese
I simply replaced $cdc with xxxx in chromedriver.exe in a hex editor and it worked! I also noticed that if you maximize the browser window (rather than use a predefined size) it's detected less often.Month
@RafaelAlmeida ChromeDriver 2.35.528139Month
was this on windows, osx, or linux? Hex editing on osx doesn't seem to work.Alliteration
hex-editted with $zzz_zzzzzzzzzzzzzzzzzzzzzz_ (same amount of characters) but didn't work.Evade
anyone having problems take my JS code that detects bots and upload it to somewhere, then visit that page and see what is being detected.Esplanade
@Alliteration on mac I get selenium.common.exceptions.WebDriverException: Message: Service /Users/ishandutta2007/Downloads/chromedriver2 unexpectedly exited. Status code was: -9Malkamalkah
Tried this with windows 2.44 exe version, only one place has 'cdc_', and replace it seems not working, still can be detected by websites.Acierate
Replacing $cdc_ worked for a couple of times, then got blocked again. It happens only on Linux though, in windows my script works fine. any other solution alongside this one will be of great help!Piperpiperaceous
I'm assuming that Selenium/Chrome71 no longer exposes these vars - I've just tried putting them into the console and they didn't exist: Uncaught ReferenceError: $cdc_asdjflasutopfhvcZLmcfl_ is not definedSalty
@NinoŠkopac you should be trying the example of "how bot networks might use to detect" script I gave at the bottom of the post. the suffix after cdc_ is not constant, and changes between builds. The CDC variable also doesn't appear immediately, it appears when you've been browsing a little on web.Esplanade
I just tried executing your function in Chrome 72 via BrowserStack - it returned false every time (which is good I think).Salty
I used a hex editor (iHex) to edit the chromedriver with version 2.35.528139. I only found one instance of cdc_ which i replaced with xxxx (like that szx did). How can I tell if it worked properly?Piaffe
so in summary there is no way of getting past a website supported by distil networks.... is this the end? no more webscraping :). even if it is not selenium it apperars they detect puthon, so does this mean that scrapy, PySpider, etc.... are also hopeless? i wonder if somebody should open up a SUPER BOUNTY on this one as there seems to be no definitve way forward..Chile
@Erti-ChrisEelmaa Is there any undetectable open-source fork which you are aware of ?Malkamalkah
Seems like a lot of manual work, but still it wouldn't cover all eventualities according to other posts in this thread. Wouldn't it be more effective and time-efficient to employ a full-blown package such as undetected-chromedriver or selenium-stealth?Akkadian
I suppose the developers of the browsers can add a new tricky flag in future, without our knowledge. I wonder if it has ever happened since this answer was written?Inveterate
@Inveterate that can happen, but it is extremely unlikely they would touch an old version, that's why it is important to pay attention to the dates / version numbers. Eg if you want to try this you can go back and get chromedrivers source code at 2016 and build it yourself.Esplanade
@Erti-ChrisEelmaa I see. Probably using a very old browser makes you an outlier. But then, websites might loose many real users if they block old browsers. So the solution would be to update the version of the browser periodically, together with new tricky flagsInveterate
E
186

Replacing cdc_ string

You can use Vim or Perl to replace the cdc_ string in chromedriver. See the answer by @Erti-Chris Eelmaa to learn more about that string and how it's a detection point.

Using Vim or Perl prevents you from having to recompile source code or use a hex editor.

Make sure to make a copy of the original chromedriver before attempting to edit it.

Our goal is to alter the cdc_ string, which looks something like $cdc_lasutopfhvcZLmcfl.

The methods below were tested on chromedriver version 2.41.578706.


Using Vim

vim -b /path/to/chromedriver

After running the line above, you'll probably see a bunch of gibberish. Do the following:

  1. Replace all instances of cdc_ with dog_ by typing :%s/cdc_/dog_/g.
    • dog_ is just an example. You can choose anything as long as it has the same amount of characters as the search string (e.g., cdc_), otherwise the chromedriver will fail.
  2. To save the changes and quit, type :wq! and press return.
    • If you need to quit without saving changes, type :q! and press return.

The -b option tells vim upfront to open the file as a binary, so it won't mess with things like (missing) line endings (especially at the end of the file).


Using Perl

The line below replaces all cdc_ occurrences with dog_. Credit to Vic Seedoubleyew:

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

Make sure that the replacement string (e.g., dog_) has the same number of characters as the search string (e.g., cdc_), otherwise the chromedriver will fail.


Wrapping Up

To verify that all occurrences of cdc_ were replaced:

grep "cdc_" /path/to/chromedriver

If no output was returned, the replacement was successful.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don't see killed in the output, you've successfully altered the driver.

Make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed.


My Experience With This Method

I was previously being detected on a website while trying to log in, but after replacing cdc_ with an equal sized string, I was able to log in. Like others have said though, if you've already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, etc.

Enlistee answered 31/8, 2018 at 3:49 Comment(25)
@LekaBaper Thanks for the heads up. The chromedriver version that I used was version 2.41.578706.Enlistee
Did not worked even when I used this chromedriver.exe modification on new physical computer on different network.Amaris
it's give an error says, this version cannot work in this computer :(Playful
@Enlistee Is there any undetectable open-source fork which you are aware of ?Malkamalkah
@Malkamalkah No, not that I’m aware of.Enlistee
Note that the chromedriver people have declared this issue won't-fix, so you can expect to have to use a fork or edit the binary for the indefinite future. bugs.chromium.org/p/chromedriver/issues/detail?id=3220Bala
hi, @colossatr0n. I've tried with your method and also change my ip, but it still not worked as to this site: authstl.alipay.com/login/index.htm, is there any new magic that i could use.Monotint
TLDR; open binary in hex editor, change string starting with $cdc to some other string of same length, then save and run modified binary.Triphylite
I have seven instances of cdc_ in my Chromedriver. Do I need to replace all or just one? Your Vim/perl methods differs on this point.Shrievalty
@Shrievalty Replace all of them. I'll update the Vim instructions.Enlistee
After modification, I cannot open the chrome driver. Mac shouted that the original file has been altered that's why for security reasons, it did not start.Paredes
for weeks now, I've been finding solutions on why chromedriver keeps giving me 403 error and such. But this where I found my heaven. Thanks for this solution on just editing the chromdriver source and everything works for me.Photima
sed -b "s/cdc_/dog_/g" chromedriver.exe > chromedriver_patched.exe For those with GNU sed on Windows.Inodorous
this breaks my chromedriver.. once I do perl -pi -e 's/cdc_/dog_/g' it breaks the chromedriver and I must restore it using perl -pi -e 's/dog_/cdc_/g' to make it running again.. I tried doing perl -pi -e 's/$cdc_/$dog_/g' but I'm not sure what the effect of it..Flexor
what about geckodriver?Sulphuric
This was THE ONLY (and I mean, really, literally, the only) method that worked for me. Thank you so much.Laspisa
@JugertMucoimaj No, this method is specific for Chrome. GeckoDriver is a completely different binary.Laspisa
@Laspisa you know sum about geckodriverSulphuric
when i use vim in Mac, it shows: E486: Pattern not found: cdc_Smallage
Note that modified chromedriver won't run on M1 Mac. Here's a workaround: https://mcmap.net/q/65248/-chromedriver-killed-on-apple-silicon-when-cdc_-modifiedElectrograph
My Chromedriver doesn't have the cdc stringComber
Have an example video to do that? I am new, I don't understand to do that.Sheepdip
Seems like a lot of manual work, but still it wouldn't cover all eventualities according to other posts in this thread. Wouldn't it be more effective and time-efficient to employ a full-blown package such as undetected-chromedriver or selenium-stealth?Akkadian
Wont work for sites using datadom datadome.co/bot-management-protection/…Intranuclear
Just as a reminder for beginners like me: 1. Remember to wrap the path in double quotes (") in both vim or pearl when modifying, with the administrator mode on. 2. cmd does not have an original command like grep, use findstr instead.Tartrazine
E
151

As we've already figured out in the question and the posted answers, there is an anti Web-scraping and a bot detection service called "Distil Networks" (which is now "Imperva") in play here. And, according to the company CEO's interview:

Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.

It'll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:

  • it's not related to the actions you take with Selenium. Once you navigate to the site, you get immediately detected and banned. I've tried to add artificial random delays between actions, take a pause after the page is loaded - nothing helped
  • it's not about browser fingerprint either. I tried it in multiple browsers with clean profiles and not, incognito modes, but nothing helped
  • since, according to the hint in the interview, this was "reverse engineering", I suspect this is done with some JavaScript code being executed in the browser revealing that this is a browser automated via Selenium WebDriver

I decided to post it as an answer, since clearly:

Can a website detect when you are using selenium with chromedriver?

Yes.


Also, I haven't experimented with older Selenium and older browser versions. In theory, there could be something implemented/added to Selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let's detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It's just a theory that needs to be tested.

Eldred answered 28/10, 2015 at 23:39 Comment(7)
@RyanWeinstein well, we have no actual proof and we can only speculate and test. For now, I would say they have a way to detect us using selenium. Try experimenting with selenium versions - this may give you some clues.Eldred
Could it have to do with how ephemeral ports are determined? The method stays away from well-known ranges. github.com/SeleniumHQ/selenium/blob/…Somite
Easyjet are using distilnetwork service, yeah it can block dummy bots but not the complicated ones because we have tested it with more than 2000 requests a day from different IPs (which we re-use again 'same' address) so basicly each IP go for a 5-10 requests a day and from this I can tell that all this bot detecting services are just there to develop and sell some 45% working algorithmes, the scrapper we used was easy to detect I can block it while destilnetworks, squareshield and others couldn't which pushed me to never use any of them.Peignoir
I think they are detecting navigator.webdriver in chrome webdriver. I tried to make navigator.webdriver = false with the help of intoli.com/blog/not-possible-to-block-chrome-headless and #47298377. It returns a bot detect page instead of distilnetworks.com/distil_identify_cookie.htmlNedneda
can you please help solving issue mentioned in this question #74252314Lafountain
They have since rebranded themselves to ImpervaDday
Thanks @mirekphd, added a note in the post!Eldred
G
54

A lot have been analyzed and discussed about a website being detected being driven by Selenium controlled ChromeDriver. Here are my two cents:

According to the article Browser detection using the user agent serving different webpages or services to different browsers is usually not among the best of ideas. The web is meant to be accessible to everyone, regardless of which browser or device an user is using. There are best practices outlined to develop a website to progressively enhance itself based on the feature availability rather than by targeting specific browsers.

However, browsers and standards are not perfect, and there are still some edge cases where some websites still detects the browser and if the browser is driven by Selenium controled WebDriver. Browsers can be detected through different ways and some commonly used mechanisms are as follows:

You can find a relevant detailed discussion in How does recaptcha 3 know I'm using selenium/chromedriver?

  • Detecting the term HeadlessChrome within headless Chrome UserAgent

You can find a relevant detailed discussion in Access Denied page with headless Chrome on Linux while headed Chrome works on windows using Selenium through Python

You can find a relevant detailed discussion in Unable to use Selenium to automate Chase site login

  • Using Bot Manager service from Akamai

You can find a relevant detailed discussion in Dynamic dropdown doesn't populate with auto suggestions on https://www.nseindia.com/ when values are passed using Selenium and Python

  • Using Bot Protection service from Datadome

You can find a relevant detailed discussion in Website using DataDome gets captcha blocked while scraping using Selenium and Python

However, using the to detect the browser looks simple but doing it well is in fact a bit tougher.

Note: At this point it's worth to mention that: it's very rarely a good idea to use user agent sniffing. There are always better and more broadly compatible way to address a certain issue.


Considerations for browser detection

The idea behind detecting the browser can be either of the following:

  • Trying to work around a specific bug in some specific variant or specific version of a webbrowser.
  • Trying to check for the existence of a specific feature that some browsers don't yet support.
  • Trying to provide different HTML depending on which browser is being used.

Alternative of browser detection through UserAgents

Some of the alternatives of browser detection are as follows:

  • Implementing a test to detect how the browser implements the API of a feature and determine how to use it from that. An example was Chrome unflagged experimental lookbehind support in regular expressions.
  • Adapting the design technique of Progressive enhancement which would involve developing a website in layers, using a bottom-up approach, starting with a simpler layer and improving the capabilities of the site in successive layers, each using more features.
  • Adapting the top-down approach of Graceful degradation in which we build the best possible site using all the features we want and then tweak it to make it work on older browsers.

Solution

To prevent the Selenium driven WebDriver from getting detected, a niche approach would include either/all of the below mentioned approaches:

  • Rotating the UserAgent in every execution of your Test Suite using fake_useragent module as follows:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from fake_useragent import UserAgent
    
    options = Options()
    ua = UserAgent()
    userAgent = ua.random
    print(userAgent)
    options.add_argument(f'user-agent={userAgent}')
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
    driver.get("https://www.google.co.in")
    driver.quit()
    

You can find a relevant detailed discussion in Way to change Google Chrome user agent in Selenium?

  • Rotating the UserAgent in each of your Tests using Network.setUserAgentOverride through execute_cdp_cmd() as follows:

    from selenium import webdriver
    
    driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe')
    print(driver.execute_script("return navigator.userAgent;"))
    # Setting user agent as Chrome/83.0.4103.97
    driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'})
    print(driver.execute_script("return navigator.userAgent;"))
    

You can find a relevant detailed discussion in How to change the User Agent using Selenium and Python

  • Changing the property value of navigator for webdriver to undefined as follows:

    driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
      "source": """
        Object.defineProperty(navigator, 'webdriver', {
          get: () => undefined
        })
      """
    })
    

You can find a relevant detailed discussion in Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

  • Changing the values of navigator.plugins, navigator.languages, WebGL, hairline feature, missing image, etc.

You can find a relevant detailed discussion in Is there a version of selenium webdriver that is not detectable?

You can find a relevant detailed discussion in How to bypass Google captcha with Selenium and python?


Dealing with reCAPTCHA

While dealing with and rather clicking on associated to the text I'm not a robot, it may be easier to get authenticated extracting and using the data-sitekey.

You can find a relevant detailed discussion in How to identify the 32 bit data-sitekey of ReCaptcha V2 to obtain a valid response programmatically using Selenium and Python Requests?


tl; dr

You can find a cutting edge solution to evade webdriver detection in:

Gemology answered 22/6, 2020 at 17:29 Comment(4)
I tested your python code on bloomberg.com. Still recognize me as bot.Postpositive
Changing the property value of navigator for webdriver to undefined worked for me!Aftergrowth
Outdated by stackoverflow.com/a/70133896?Oneida
can you please help solving issue mentioned in this question #74252314Lafountain
G
31

With the availability of Selenium Stealth evading the detection of Selenium driven ChromeDriver initiated Browsing Context have become much more easier.


selenium-stealth

selenium-stealth is a Python package to prevent detection. This programme tries to make python selenium more stealthy. However, as of now selenium-stealth only support Selenium Chrome.

Features that currently selenium-stealth can offer:

  • selenium-stealth with stealth passes all public bot tests.

  • With selenium-stealth selenium can do google account login.

  • selenium-stealth help with maintaining a normal reCAPTCHA v3 score


Installation

Selenium-stealth is available on PyPI so you can install with pip as follows:

pip install selenium-stealth

compatible code

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.chrome.service import Service
    from selenium_stealth import stealth
    
    
    options = Options()
    options.add_argument("start-maximized")
    
    # Chrome is controlled by automated test software
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    s = Service('C:\\BrowserDrivers\\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    
    # Selenium Stealth settings
    stealth(driver,
          languages=["en-US", "en"],
          vendor="Google Inc.",
          platform="Win32",
          webgl_vendor="Intel Inc.",
          renderer="Intel Iris OpenGL Engine",
          fix_hairline=True,
      )
    
    driver.get("https://bot.sannysoft.com/")
    
  • Browser Screenshot:

bot_sannysoft_com


tl; dr

You can find a couple of relevant detailed discussion in:

Gemology answered 27/11, 2021 at 10:23 Comment(4)
How does this compare to undetected-chromedriver?Akkadian
I tried this solution, but didnt work out for me. They detected I'm using some botDundalk
Sadly, it's not been actively maintained anymore (last commit from 2020).Dday
@Dday Agree :) the timeline when this answer was written, Chromium team themselves got shaken up ;)Gemology
G
29

Example of how it's implemented on wellsfargo.com:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""
Greg answered 11/9, 2016 at 23:21 Comment(1)
why is the last try not closed ? besides can u explain your answer a little.Malkamalkah
P
25

Obfuscating JavaScript result

I have checked the chromedriver source code. That injects some JavaScript files into the browser.
Every JavaScript file in this link is injected to the web pages: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

So I used reverse engineering and obfuscated the JavaScript files by hex editing. Now I was sure that no more JavaScript variables, function names and fixed strings were used to uncover selenium activity. But still some sites and reCAPTCHA detect Selenium!

Maybe they check the modifications that are caused by chromedriver JavaScript execution :)

Chrome 'navigator' parameters modification

I discovered there are some parameters in 'navigator' that briefly uncover using of chromedriver.

These are the parameters:

  • "navigator.webdriver" In non-automated mode it is 'undefined'. In automated mode it's 'true'.
  • "navigator.plugins" In headless Chrome, it has 0 length. So I added some fake elements to fool the plugin length checking process.
  • "navigator.languages" was set to default chrome value '["en-US", "en", "es"]'.

So what I needed was a chrome extension to run JavaScript on the web pages. I made an extension with the JavaScript code provided in the article and used another article to add the zipped extension to my project. I have successfully changed the values; but still nothing changed!

I didn't find other variables like these, but it doesn't mean that they don't exist. Still reCAPTCHA detects chromedriver, So there should be more variables to change. The next step should be reverse engineering of the detector services that I don't want to do.

Now I'm not sure if is it worth it to spend more time on this automation process or search for alternative methods!

Procambium answered 5/12, 2018 at 12:56 Comment(1)
Is this ontop of possibly removing the $cdc entries via a hex editor?Endoplasm
H
20

Try to use Selenium with a specific user profile of Chrome. That way you can use it as specific user and define anything you want. When doing so, it will run as a 'real' user. Look at the Chrome process with some process explorer and you'll see the difference with the tags.

For example:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username +
    "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"

options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# Add any tag here you want.
options.add_experimental_option(
    "excludeSwitches",
    """
        ignore-certificate-errors
        safebrowsing-disable-download-protection
        safebrowsing-disable-auto-update
        disable-client-side-phishing-detection
    """.split()
)
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

Google Chrome tag list here

Harrell answered 28/10, 2015 at 16:39 Comment(0)
K
18

All I had to do was:

my_options = webdriver.ChromeOptions()
my_options.add_argument( '--disable-blink-features=AutomationControlled' )

Some more information to this: This relates to website skyscanner.com. In the past I have been able to scrape it. Yes, it did detect the browser automation and it gave me a captcha to press and hold a button. I used to be able to complete the captcha manually, then search flights and then scrape. But this time around after completing the captcha I get the same captcha again and again, just can't seem to escape from it. I tried some of the most popular suggestions to avoid automation being detected, but they didn't work. Then I found this article which did work, and by process of elimination I found out it only took the option above to get around their browser automation detection. Now I don't even get the captcha and everything else seems to be working normally.

Versions I am running currently:

  • OS: Windows 7 64 bit
  • Python 3.8.0 (tags/v3.8.0:fa919fd, 2019-10-14) (MSC v.1916 64 bit (AMD64)) on win32
  • Browser: Chrome Version 100.0.4896.60 (Official Build) (64-bit)
  • Selenium 4.1.3
  • ChromeDriver 100.0.4896.60 chromedriver_win32.zip 930ff33ae8babeaa74e0dd1ce1dae7ff
Karyolysis answered 3/4, 2022 at 13:19 Comment(1)
github.com/diprajpatra/selenium-stealth/issues/9Oneida
W
15

It works for some websites, remove property webdriver from navigator

from selenium import webdriver
driver = webdriver.Chrome()
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
    "source":
        "const newProto = navigator.__proto__;"
        "delete newProto.webdriver;"
        "navigator.__proto__ = newProto;"
    })
Waistcoat answered 20/1, 2021 at 19:40 Comment(0)
D
14

partial interface Navigator { readonly attribute boolean webdriver; };

The webdriver IDL attribute of the Navigator interface must return the value of the webdriver-active flag, which is initially false.

This property allows websites to determine that the user agent is under control by WebDriver, and can be used to help mitigate denial-of-service attacks.

Taken directly from the 2017 W3C Editor's Draft of WebDriver. This heavily implies that at the very least, future iterations of Selenium's drivers will be identifiable to prevent misuse. Ultimately, it's hard to tell without the source code, what exactly causes chrome driver in specific to be detectable.

Drawknife answered 27/1, 2017 at 23:5 Comment(2)
"it's hard to tell without the source code" .. well the source code is freely availableChilpancingo
I meant without the website in question's source code. It's hard to tell what they are checking against.Drawknife
A
10

Firefox is said to set window.navigator.webdriver === true if working with a webdriver. That was according to one of the older specs (e.g.: archive.org) but I couldn't find it in the new one except for some very vague wording in the appendices.

A test for it is in the selenium code in the file fingerprint_test.js where the comment at the end says "Currently only implemented in firefox" but I wasn't able to identify any code in that direction with some simple greping, neither in the current (41.0.2) Firefox release-tree nor in the Chromium-tree.

I also found a comment for an older commit regarding fingerprinting in the firefox driver b82512999938 from January 2015. That code is still in the Selenium GIT-master downloaded yesterday at javascript/firefox-driver/extension/content/server.js with a comment linking to the slightly differently worded appendix in the current w3c webdriver spec.

Amidase answered 27/10, 2015 at 23:44 Comment(2)
I just tested webdriver with Firefox 55 and I can confirm this is not true. The variable window.navigator.webdriver is not defined.Fortner
Update: I tested with Firefox 65, and this is true: window.navigator.webdriver == trueFortner
C
10

Additionally to the great answer of Erti-Chris Eelmaa - there's annoying window.navigator.webdriver and it is read-only. Even if you change the value of it to false, it will still have true. That's why the browser driven by automated software can still be detected.

MDN

The variable is managed by the flag --enable-automation in chrome. The chromedriver launches Chrome with that flag and Chrome sets the window.navigator.webdriver to true. You can find it here. You need to add to "exclude switches" the flag. For instance (Go):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}
Chicken answered 28/1, 2019 at 14:47 Comment(0)
A
9

The bot detection I've seen seems more sophisticated or at least different than what I've read through in the answers below.

Experiment 1

  1. I open a browser and web page with Selenium from a Python console.
  2. The mouse is already at a specific location where I know a link will appear once the page loads. I never move the mouse.
  3. I press the left mouse button once (this is necessary to take focus from the console where Python is running to the browser).
  4. I press the left mouse button again (remember, cursor is above a given link).
  5. The link opens normally, as it should.

Experiment 2

  1. As before, I open a browser and the web page with Selenium from a Python console.

  2. This time around, instead of clicking with the mouse, I use Selenium (in the Python console) to click the same element with a random offset.

  3. The link doesn't open, but I am taken to a sign up page.

Implications

  • opening a web browser via Selenium doesn't preclude me from appearing human
  • moving the mouse like a human is not necessary to be classified as human
  • clicking something via Selenium with an offset still raises the alarm

It seems mysterious, but I guess they can just determine whether an action originates from Selenium or not, while they don't care whether the browser itself was opened via Selenium or not. Or can they determine if the window has focus? It would be interesting to hear if anyone has any insights.

Ability answered 11/4, 2018 at 18:41 Comment(5)
My belief is that Selenium injects something into the page via javascript to find and access elements. This injection is what I believe they are detecting.Sleazy
You are right, This test is 100% valid. I had done similar test with same results. I could send Enter tab or send keys. The moment I access elements the page stoped working. So If driver injects some javascript into the browser. We could just encrypt that javascript using chrome extension and decrypt on next page using same extension. I will try to look at it following days.Erudite
Could you provide a link to test this behavior? I would like to investigate this detection method and create a bypassErosion
I'd be interested to see if you could bypass this in headless chrome, as extensions are supported hereLibya
it doesn't work like that. A website can use a variety of services and methods to detect bots. The best way is just detect selenium through the fingerprints. But there are many others.Elegancy
C
9

One more thing I found is that some websites uses a platform that checks the User Agent. If the value contains: "HeadlessChrome" the behavior can be weird when using headless mode.

The workaround for that will be to override the user agent value, for example in Java:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");
Conurbation answered 3/4, 2019 at 15:12 Comment(0)
S
7

It sounds like they are behind a web application firewall. Take a look at modsecurity and OWASP to see how those work.

In reality, what you are asking is how to do bot detection evasion. That is not what Selenium WebDriver is for. It is for testing your web application not hitting other web applications. It is possible, but basically, you'd have to look at what a WAF looks for in their rule set and specifically avoid it with selenium if you can. Even then, it might still not work because you don't know what WAF they are using.

You did the right first step, that is, faking the user agent. If that didn't work though, then a WAF is in place and you probably need to get more tricky.

Point taken from other answer. Make sure your user agent is actually being set correctly first. Maybe have it hit a local web server or sniff the traffic going out.

Snowonthemountain answered 23/10, 2015 at 23:28 Comment(15)
I think you are on the correct path. I tested with my setup and replaced the User Agent with a valid user agent string that successfully went through and received the same result, stubhub blocked the request.Anselmo
Okay, if the user agent is fine, then they have attack detection in place for sure. WAF is a good place to start. Not that I'm condoning hitting other websites. I'm just answering in the name of science and advancement of human knowledge.Snowonthemountain
Try adding: driver.set_preference("general.useragent.override","your_user_agent_string")Seiter
This is purely for science. I want to know how its possible they are detecting something that should theoretically be undetectable. Maybe they have a small backdoor in Selenium that does a small obscure js thing that they can use to detect it.Magavern
There are a lot of ways to detect if a user is a bot or is possibly malicious in general. It has nothing to do with selenium itself. In this case, selenium is acting suspiciously. One way to detect bad behavior or suspicious behavior is by checking behavior heuristics. For example, how fast a user is consuming http requests. If they are hitting the same url from the same ip over and over again he could be attempting a Denial of Service attack. Etc...Snowonthemountain
This topic is very vast, I would say if you don't understand it, and you want to understand it, here is not the right place. Start with owasp. Look into penetration testing and web security. Also, like I said before, look into modsecurity and WAF for specifically this topic.Snowonthemountain
Yeah but I want to know how they do it in this instance. I'm not flooding them with requests. There isn't any suspicious HTTP traffic. Its just me using a browser to go on their website, and somehow they know its Selenium. In this case, Selenium is acting suspiciously and I need to know why.Magavern
Okay, so again there is no simple and sweet answer to give, but checkout this article blogs.akamai.com/2013/06/…Snowonthemountain
One thing to consider is are all your http headers being set correctly. Headers are a big thing that a browser takes care of for you. Are you also setting those. That is another type of low hanging fruit you can try. Since you don't know what is actually triggering the blocking, you have to do it by trial and error. They also sometimes check if javascript is active, how the client responds to http header changes, some even track click patterns. It really depends on the type of security they have in place.Snowonthemountain
Finally, if you want the literal rules you are probably looking at evading, go back to my original hint. Look at mod_security. Here are the base rules that pretty much any security company has to cover to be even respectable IMHO. They are the owasp core rule set and were the target of my original hint, but I think you didn't catch my drift: github.com/SpiderLabs/owasp-modsecurity-crsSnowonthemountain
If it was an HTTP header issue then wouldn't the normal browser get blocked? The HTTP headers are exactly the same. Also what exactly am I looking at with that github link? Have you tried using selenium to go on stubhub? Something is very very off.Magavern
Look at builtwith.com/stubhub.com it tells you what they are using. They are using Akamai which uses the mod_security core rule set. Again, what I hinted at. If you don't know what you are looking at even and are not interested in learning, then you probably should just stop trying. Honestly, this is becoming border line spoon fooding. I don't know what you are trying to do, so it might even be borderline illegal. The github link literally tells you what rules they are using. If you don't know how to read the rules, then you probably can't figure out how to bypass them.Snowonthemountain
I will not find the exact rule that is blocking you and tell you how to avoid it. That is way beyond the ethical line. I can help you understand why something is happening and point you in the direction of greater understanding, but it is unethical for me to help you abuse a web application. Their security is there for a reason and you are trying to bypass it. If you were hitting your own application and you gave me access, I would probably be a lot more helpful.Snowonthemountain
I'm sorry for the confusion. I'll look into that and you don't have to help me anymore if you don't want to. Most of my experience is in programming systems applications, so I was not familiar with these modsecurity rules that you're talking about. I'll take a look and try to educate myself. I'm not trying to bypass anything, I was just interested in knowing how these websites detect a user using selenium.Magavern
I'm a developer too :). Learning is a cause I can get behind. I don't mind helping, I just wanted to make clear that I didn't know your intentions and could not exactly help you bypass their website security. To answer your question though, it is not selenium that they are detecting. The rules detected suspicious behavior and decided to take the appropriate measures against the offending client. They catch you by what you are not doing more than by what you are doing. In the repo link, you can checkout this file to get an idea base_rules/modsecurity_crs_20_protocol_violations.confSnowonthemountain
B
6

Even if you are sending all the right data (e.g. Selenium doesn't show up as an extension, you have a reasonable resolution/bit-depth, &c), there are a number of services and tools which profile visitor behaviour to determine whether the actor is a user or an automated system.

For example, visiting a site then immediately going to perform some action by moving the mouse directly to the relevant button, in less than a second, is something no user would actually do.

It might also be useful as a debugging tool to use a site such as https://panopticlick.eff.org/ to check how unique your browser is; it'll also help you verify whether there are any specific parameters that indicate you're running in Selenium.

Bain answered 25/10, 2015 at 22:1 Comment(1)
I've already used that website and the fingerprint is identical to my normal browser. Also I'm not automating anything. I'm just browsing as normal.Magavern
E
6

Answer: YES

Some sites will detect selenium by the browser's fingeprints and other data, other sites will detect selenium based on behavior, not only based on what you do, but what you don't do as well.

Usually with the data that selenium provides is enough to detect it.

you can check the browser fingerprints in sites like this ones

https://bot.sannysoft.com
https://fingerprintjs.github.io/fingerprintjs/
https://antoinevastel.com/bots/

try with your user browser, then try with selenium, you'll see the differences.

You can change some fingerprints with options(), like user agent and others, see the results by yourself.

You can try to avoid this detection by many ways, I recommend using this library:undetected_chromedriver:

https://github.com/ultrafunkamsterdam/undetected-chromedriver

import undetected_chromedriver.v2 as uc

Else you can try using an alternative to selenium. I heard of PhantomJS, but didn't tried.

Elegancy answered 11/8, 2021 at 15:56 Comment(4)
If you take a look thru the code you'll see that he's automatically implemented all of the aspects covered here in this thread, including hex editing the chromedriver.Gilead
I think he didn't tried undetectable chromedriver or using an alternative to selenium. Other thing I learned recently, I don't know if I understood perfectly but it seem that selenium actually doesn't make clicks, it 'simulates' them making HTTP requests. This is a big way to detect selenium, because humans make real clicksElegancy
That's interesting actually- maybe it's best to "click" using javascript execution instead? Along the same thread, I noticed on one site in particular if I used driver.refresh() I got flagged right away. Might be the same mechanism you're describing?Gilead
In that case I'm not sure why that's happening but you could save cookies to a pickle file, then load cookies again and then driver.get(url) , instead of using driver.refresh(). If you have doubts with how to load cookies check this link: https://stackoverflow.com/questions/15058462/how-to-save-and-load-cookies-using-python-selenium-webdriverElegancy
C
6

The Chromium developers recently added a 2nd headless mode in 2021, which no longer adds HeadlessChrome to the user agent string. See https://bugs.chromium.org/p/chromium/issues/detail?id=706008#c36

They later renamed the option in 2023 for Chrome 109 -> https://github.com/chromium/chromium/commit/e9c516118e2e1923757ecb13e6d9fff36775d1f4

The newer --headless=new flag will now allow you to get the full functionality of Chrome in the new headless mode, and you can even run extensions in it, for Chrome 109 and above. (If using Chrome 96 through 108, use the older --headless=chrome option.)

Usage: (Chrome 109 and above):

options.add_argument("--headless=new")

Usage: (Chrome 96 through Chrome 108):

options.add_argument("--headless=chrome")

This new headless mode makes Chromium browsers work just like regular mode, which means they won't be as easily detected as Chrome in the older headless mode.

Combine that with other tools such as undetected-chromedriver for maximum evasion against Selenium-detection.


You can also use the anti-detection mechanisms that SeleniumBase provides:

pip install seleniumbase, then run the following with python:

from seleniumbase import Driver
import time

driver = Driver(uc=True)
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()

This script bypasses detection on a site that would normally block selenium.

SeleniumBase also has other formats with its own API:

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://nowsecure.nl/#relax")
    sb.sleep(3)
    if not sb.is_text_visible("OH YEAH, you passed!", "h1"):
        sb.get_new_driver(undetectable=True)
        sb.open("https://nowsecure.nl/#relax")
        sb.sleep(3)
    sb.assert_text("OH YEAH, you passed!", "h1", timeout=3)

(That's an example script that bypasses Cloudflare detection, like the script above it.)

Canister answered 24/9, 2022 at 21:27 Comment(0)
E
5

Some sites are detecting this:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}
Erosion answered 22/8, 2017 at 9:52 Comment(1)
This doesn't work for Chrome and Firefox, selenium 3.5.0, ChromeDriver 2.31.488774, geckodriver 0.18.0Andvari
I
5

It seems to me the simplest way to do it with Selenium is to intercept the XHR that sends back the browser fingerprint.

But since this is a Selenium-only problem, it’s better just to use something else. Selenium is supposed to make things like this easier, not way harder.

Inattention answered 2/12, 2018 at 1:32 Comment(5)
What are other options to selenium?Gabriellegabrielli
I guess Requests would be the main python option. If you send the same exact requests that your browser sends, you will appear as a normal browser.Inattention
Actually you have to use selenium if the target website uses javascript for some things you need to access/do. Else, you should use request because is much faster. I think the thing is to find some other chromedriver/solution similar to selenium. I heard of phantomJS, I'll try.Elegancy
@Elegancy - these days I'm recommending python playwright, it's getting harder to spoof requests.Inattention
@Inattention - Do you mean you use just python, without the requests module ? If that's correct how do you do that ?Elegancy
A
4

Write an HTML page with the following code. You will see that in the DOM selenium applies a webdriver attribute in the outerHTML:

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>
Anisometropia answered 28/10, 2015 at 4:10 Comment(2)
The attribute is added only in Firefox.Exonerate
And it is possible to remove it from the selenium extension that controlls browser. It will work anyway.Ridgley
N
2

You can try to use the parameter "enable-automation"

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

But, I want to warn that this ability was fixed in ChromeDriver 79.0.3945.16. So probably you should use older versions of chrome.

Also, as another option, you can try using InternetExplorerDriver instead of Chrome. As for me, IE does not block at all without any hacks.

And for more info try to take a look here:

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Unable to hide "Chrome is being controlled by automated software" infobar within Chrome v76

Nucleolated answered 10/1, 2020 at 11:57 Comment(0)
P
1

I've found changing the JavaScript "key" variable like this:

//Fools the website into believing a human is navigating it
((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

works for some websites when using Selenium WebDriver along with Google Chrome, since many sites check for this variable in order to avoid being scraped by Selenium.

Parchment answered 3/5, 2019 at 14:36 Comment(0)
E
1

I have the same problem and solved the issue with the following configuration (in C#)

options.AddArguments("start-maximized");
options.AddArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

options.AddExcludedArgument("enable-automation"); // For hiding chrome being controlled by automation..
options.AddAdditionalCapability("useAutomationExtension", false);

// Import cookies
options.AddArguments("user-data-dir=" + userDataDir);

options.AddArguments("profile-directory=" + profileDir);
Etiquette answered 15/10, 2021 at 23:24 Comment(0)
S
0

It is possible to make your web browser and driver invisible. But you need to understand that web browser and driver developers are always under high pressure: headquarters want to be legal, so they accepts special detection mechanisms. So you won't ever receive stealth browser and driver from the box. Solution is the following:

  1. You should patch web browser, driver and selenium itself by removing everything that can be transparent.
  2. Do not share your patches with anyone. If you will publish your patches they will be defeated by new detection mechanism.
  3. You should implement automatic tests that will compare vanilla browser env and patched browser + driver env. It should be identical from any possible point of view. Do not publish your tests.
  4. You should maintain your patches by moving to the next version of web browser, driver and selenium.
  5. You should maintain your tests and update patches when tests fails.

Is it possible to protect your website from bots? Generaly speaking yes, but the only good solution is captcha. Do not respect navigator, js env, unique events behaviour, etc. Please don't expect that patches will be fluffy toys like undetected-chromedriver, selenium-stealth, etc.

You should always remember that detection means you want something from unknown application on the client side. Client may remove everything by patching his own application, you don't know how much he (or his employee) knows about web browser and driver source code. You have no chance to detect him if his employee took part in open source web browser development.

Just for example (stuff mentioned in this question):

  1. navigator.webdriver
  2. cdc_
  3. HeadlessChrome
  4. Languages
  5. __webdriver

Everything in this list can be hidden/removed in 5 minutes, but there are much more other side effects that can betray the bot.

Situs answered 11/5, 2023 at 19:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.