Referer missing in HTTP header of Selenium request
Asked Answered
C

1

12

I'm writing some tests with Selenium and noticed, that Referer is missing from the headers. I wrote the following minimal example to test this with https://httpbin.org/headers:

import selenium.webdriver

options = selenium.webdriver.FirefoxOptions()
options.add_argument('--headless')

profile = selenium.webdriver.FirefoxProfile()
profile.set_preference('devtools.jsonview.enabled', False)

driver = selenium.webdriver.Firefox(firefox_options=options, firefox_profile=profile)
wait = selenium.webdriver.support.ui.WebDriverWait(driver, 10)

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
wait.until(lambda driver: driver.current_url == url)
print(driver.page_source)

driver.close()

Which prints:

<html><head><link rel="alternate stylesheet" type="text/css" href="resource://content-accessible/plaintext.css" title="Wrap Long Lines"></head><body><pre>{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.5", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
  }
}
</pre></body></html>

So there is no Referer. However, if I browse to any page and manually execute

window.location.href = "https://httpbin.org/headers"

in the Firefox console, Referer does appear as expected.


As pointed out in the comments below, when using

driver.get("javascript: window.location.href = '{}'".format(url))

instead of

driver.execute_script("window.location.href = '{}';".format(url))

the request does include Referer. Also, when using Chrome instead of Firefox, both methods include Referer.

So the main question still stands: Why is Referer missing in the request when sent with Firefox as described above?

Coachandfour answered 9/1, 2019 at 23:1 Comment(3)
The issue here is a bug in the Firefox driver / Marionette. To get the Referer, run driver.get("javascript: window.location.href = 'https://httpbin.org/headers' ").Godrich
it's a bug since the default policy in place should not block it and mostly because it is present when the location is changed directly via the console or when the driver is switched to Chrome.Godrich
nope, if it was a policy defined by the gecko driver, then you wouldn't get the Referer when the location is changed manually in the console. My guess is that the JavaScript sandbox is somehow interfering.Godrich
V
7

Referer as per the MDN documentation

The Referer request header contains the address of the previous web page from which a link to the currently requested page was followed. The Referer header allows servers to identify where people are visiting them from and may use that data for analytics, logging, or optimized caching, for example.

Important: Although this header has many innocent uses it can have undesirable consequences for user security and privacy.

Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer


However:

A Referer header is not sent by browsers if:

  • The referring resource is a local "file" or "data" URI.
  • An unsecured HTTP request is used and the referring page was received with a secure protocol (HTTPS).

Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer


Privacy and security concerns

There are some privacy and security risks associated with the Referer HTTP header:

The Referer header contains the address of the previous web page from which a link to the currently requested page was followed, which can be further used for analytics, logging, or optimized caching.

Source: https://developer.mozilla.org/en-US/docs/Web/Security/Referer_header:_privacy_and_security_concerns#The_referrer_problem


Addressing the security concerns

From the Referer header perspective majority of security risks can be mitigated following the steps:

  • Referrer-Policy: Using the Referrer-Policy header on your server to control what information is sent through the Referer header. Again, a directive of no-referrer would omit the Referer header entirely.
  • The referrerpolicy attribute on HTML elements that are in danger of leaking such information (such as <img> and <a>). This can for example be set to no-referrer to stop the Referer header being sent altogether.
  • The rel attribute set to noreferrer on HTML elements that are in danger of leaking such information (such as <img> and <a>).
  • The Exit Page Redirect technique: This is the only method that should work at the moment without flaw is to have an exit page that you don’t mind having inside of the referer header. Many websites implement this method, including Google and Facebook. Instead of having the referrer data show private information, it only shows the website that the user came from, if implemented correctly. Instead of the referrer data appearing as http://example.com/user/foobar the new referrer data will appear as http://example.com/exit?url=http%3A%2F%2Fexample.com. The way the method works is by having all external links on your website go to a intermediary page that then redirects to the final page. Below we have a link to the website example.com and we URL encode the full URL and add it to the url parameter of our exit page.

Sources:


This usecase

I have executed your code through both through GeckoDriver/Firefox and ChromeDriver/Chrome combination:

Code Block:

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
WebDriverWait(driver, 10).until(lambda driver: driver.current_url == url)
print(driver.page_source)

Observation:

  • Using GeckoDriver/Firefox Referer: "https://www.python.org/" header was missing as follows:

        {
          "headers": {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
            "Accept-Encoding": "gzip, deflate, br", 
            "Accept-Language": "en-US,en;q=0.5", 
            "Host": "httpbin.org", 
            "Upgrade-Insecure-Requests": "1", 
            "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"
          }
        }
    
  • Using ChromeDriver/Chrome Referer: "https://www.python.org/" header was present as follows:

        {
          "headers": {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 
            "Accept-Encoding": "gzip, deflate, br", 
            "Accept-Language": "en-US,en;q=0.9", 
            "Host": "httpbin.org", 
            "Referer": "https://www.python.org/", 
            "Upgrade-Insecure-Requests": "1", 
            "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36"
          }
        }
    

Conclusion:

It seems to be an issue with GeckoDriver/Firefox in handling the Referer header.


Outro

Referrer Policy

Vermination answered 4/6, 2019 at 15:49 Comment(4)
As per your conclusion, the only way to Referer will be via execute_script?Rovner
The BrowserMob proxy isn't actively maintained and hasn't had a release in 3 years. Might I suggest the BrowserUp Proxy browserup.com/blog/… It is a drop in replacement for the BrowserMob Proxy, but with added HTTP/2, Brotli support, Support up to Java 11 (BrowserMob only goes to 8), Modern Dependencies and active maintainers.Sputter
The current implementation of execute_script simply fails to add the Referer header. It has nothing to do with retrieving the headers from the conversion you are mentioning.Godrich
@DebanjanB So now you changed your conclusion to "It seems to be an issue with GeckoDriver/Firefox in handling the Referer header". I mean, yes, but what is the issue? How can I fix it? Is it desired behavior or a bug?Coachandfour

© 2022 - 2024 — McMap. All rights reserved.