How to get status code by using selenium.py (python code)
Asked Answered
A

14

52

I am writing a selenium script by python, but I think I don't see any information about:

How to get http status code from selenium Python code.

Or I missing something. If anyone found that, please feel free to post.

Angola answered 27/4, 2011 at 4:8 Comment(0)
N
53

It's Not Possible.

Unfortunately, Selenium does not provide this information by design. There is a very lengthy discussion about this, but the short of it is that:

  1. Selenium is a browser emulation tool, not necessarily a testing tool.
  2. Selenium performs many GETs and POSTs during the process of rendering a page and adding an interface for that would complicate the API in ways the authors resist.

We're left with hacks like:

  1. Look for error information in the returned HTML.
  2. Use another tool instead like Requests (but see the shortcomings of that approach in @Zeinab's answer.
Norsworthy answered 6/8, 2014 at 14:9 Comment(3)
The only actual answer to the question asked. Thanks!Reformed
Your answer is wrong. Stefan Matei's answer and Jarad's answer get the status code.Alannaalano
Somehow agree with "by design" comment. It's common multiple requests were triggered by script in the browser given with the initial request. It's not clear which response status code it's referring to unless the browser keep the first status codeHydrops
K
16

I do not have much experience with python. I have a more detailed java example here:

https://mcmap.net/q/159642/-how-to-get-http-response-code-using-selenium-webdriver

The idea is to enable Performance logging. This is triggering "Network.enable" on chromedriver. Then get the Performance log entries and parse them for "Network.responseReceived" message.

    from selenium import webdriver

    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities    
    # enable browser logging
    d = DesiredCapabilities.CHROME
    d['loggingPrefs'] = { 'performance':'ALL' }

    driver = webdriver.Chrome(executable_path="c:\\windows\\chromedriver.exe", service_args=["--verbose", "--log-path=D:\\temp3\\chromedriverxx.log"], desired_capabilities=d)

    driver.get('https://api.ipify.org/?format=text')

    print(driver.title)

    print(driver.page_source)

    performance_log = driver.get_log('performance')
    print (str(performance_log).strip('[]'))

    for entry in driver.get_log('performance'):
        print (entry)

The output will contain "Network.responseReceived" for your url, other requests that are done by the page load, or redirect urls. All you have to do is parse the log entries.

'{"message":{"method":"Network.responseReceived","params":{"frameId":"9488.1","loaderId":"9488.1","requestId":"9488.1","response":{"connectionId":14,"connectionReused":false,"encodedDataLength":-1,"fromDiskCache":false,"fromServiceWorker":false,"headers":{"Connection":"keep-alive","Content-Length":"13","Content-Type":"text/plain","Date":"Wed, 12 Oct 2016 06:15:47 GMT","Server":"Cowboy","Via":"1.1 vegur"},"headersText":"HTTP/1.1 200 OK\\r\\nServer: Cowboy\\r\\nConnection: keep-alive\\r\\nContent-Type: text/plain\\r\\nDate: Wed, 12 Oct 2016 06:15:47 GMT\\r\\nContent-Length:13\\r\\nVia:1.1vegur\\r\\n\\r\\n","mimeType":"text/plain","protocol":"http/1.1","remoteIPAddress":"54.197.246.207","remotePort":443,"requestHeaders":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8","Accept-Encoding":"gzip, deflate, sdch, br","Accept-Language":"en-GB,en-US;q=0.8,en;q=0.6","Connection":"keep-alive","Host":"api.ipify.org","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"},"requestHeadersText":"GET /?format=text HTTP/1.1\\r\\nHost: api.ipify.org\\r\\nConnection: keep-alive\\r\\nUpgrade-Insecure-Requests: 1\\r\\nUser-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36\\r\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\\r\\nAccept-Encoding: gzip, deflate, sdch, br\\r\\nAccept-Language: en-GB,en-US;q=0.8,en;q=0.6\\r\\n\\r\\n","securityDetails":{"certificateId":1,"certificateValidationDetails":{"numInvalidScts":0,"numUnknownScts":0,"numValidScts":0},"cipher":"AES_128_GCM","keyExchange":"ECDHE_RSA","protocol":"TLS 1.2","signedCertificateTimestampList":[]},"securityState":"secure","status":200,"statusText":"OK","timing":{"connectEnd":320.508999997401,"connectStart":3.08100000256673,"dnsEnd":3.08100000256673,"dnsStart":0,"proxyEnd":-1,"proxyStart":-1,"pushEnd":0,"pushStart":0,"receiveHeadersEnd":465.725000001839,"requestTime":78246.775045,"sendEnd":320.995999994921,"sendStart":320.825999995577,"sslEnd":320.435000001453,"sslStart":141.675999999279,"workerReady":-1,"workerStart":-1},"url":"https://api.ipify.org/?format=text"},"timestamp":78247.242716,"type":"Document"}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948094, 'level': 'INFO', 'message': '{"message":{"method":"Network.dataReceived","params":{"dataLength":13,"encodedDataLength":171,"requestId":"9488.1","timestamp":78247.243137}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948094, 'level': 'INFO', 'message': '{"message":{"method":"Page.frameNavigated","params":{"frame":{"id":"9488.1","loaderId":"9488.1","mimeType":"text/plain","securityOrigin":"https://api.ipify.org","url":"https://api.ipify.org/?format=text"}}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948095, 'level': 'INFO', 'message': '{"message":{"method":"Network.loadingFinished","params":{"encodedDataLength":171,"requestId":"9488.1","timestamp":78247.242066}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948115, 'level': 'INFO', 'message': '{"message":{"method":"Page.loadEventFired","params":{"timestamp":78247.264169}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948115, 'level': 'INFO', 'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"9488.1"}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 147625298116, 'level': 'INFO', 'message': '{"message":{"method":"Page.domContentEventFired","params":{"timestamp":78247.276475}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948122, 'level': 'INFO', 'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://api.ipify.org/?format=text","frameId":"9488.1","initiator":{"type":"other"},"loaderId":"9488.1","request":{"headers":{"Referer":"https://api.ipify.org/?format=text","User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"},"initialPriority":"High","method":"GET","mixedContentType":"none","url":"https://api.ipify.org/favicon.ico"},"requestId":"9488.2","timestamp":78247.280131,"type":"Other","wallTime":1476252948.11805}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}

and get "status":200 from the json response. You can also parse the response "headers".

Kurtis answered 12/10, 2016 at 6:31 Comment(3)
get error on mac: selenium.common.exceptions.WebDriverException: Message: POST /session/4fd2b36a-6c9a-e34d-8e9a-022424c7f36f/log did not match a known commandPreterit
@Preterit It works only for Chrome. Usually this error is thrown when using other browsers (like Firefox). For Firefox you have to dump a log file and then parse it java exampleKurtis
This does not work in Chrome todavy, at least with perl. Says this is is not a W3C command.Duck
D
11
import json
from selenium.webdriver.chrome.webdriver import WebDriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

chromedriver_path = "YOUR/PATH/TO/chromedriver.exe"
url = "https://selenium-python.readthedocs.io/api.html"
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['goog:loggingPrefs'] = {'performance': 'ALL'}

browser = WebDriver(chromedriver_path, desired_capabilities=capabilities)

browser.get(url)
logs = browser.get_log('performance')

Option 1: if you just want to return the status code under the assumption that the page you want the status code from... exists in the log containing 'text/html content type

def get_status(logs):
    for log in logs:
        if log['message']:
            d = json.loads(log['message'])
            try:
                content_type = 'text/html' in d['message']['params']['response']['headers']['content-type']
                response_received = d['message']['method'] == 'Network.responseReceived'
                if content_type and response_received:
                    return d['message']['params']['response']['status']
            except:
                pass

Usage:

>>> get_status(logs)
200

Option 2: if you wanted to see all status codes in the relevant logs

def get_status_codes(logs):
    statuses = []
    for log in logs:
        if log['message']:
            d = json.loads(log['message'])
            if d['message'].get('method') == "Network.responseReceived":
                statuses.append(d['message']['params']['response']['status'])
    return statuses

Usage:

>>> get_status_codes(logs)
[200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]

Note 1: much of this is based on @Stefan Matei answer, however, a few things have changed between Chrome versions and I provide an idea of how to parse the logs.

Note 2: ['content-type'] Not fully reliable. Casing can change. Inspect for your use-case.

Dail answered 14/9, 2020 at 0:6 Comment(1)
Check d['message']['params']['requestId'] instead of d['message']['params']['response']['headers']['content-type']Genevieve
A
4

I will refer you to a question I asked earlier: How to detect when Selenium loads a browser's error page

The short of it is that unless you want to get uber fancy with something like a squid proxy or browsermob, then you have to go for a dirty solution like below.

Replace

driver.get( "http://google.com" )

with

def goTo( url ):
    if "errorPageContainer" in [ elem.get_attribute("id") for elem in driver.find_elements_by_css_selector("body > div") ]:
        raise Exception( "this page is an error" )
    else:
        driver.get( url )

You can get creative and get the error code based on the text displayed in the actual browser. This will have to be customized based on the browser; the one above works for firefox.

The only way this becomes problematic is with 404's (page not found), since many sites have their own error pages and you have to customize it for each one.

Ancillary answered 24/6, 2014 at 23:53 Comment(0)
B
4

It seems to be possible to get response status code from the log via API.

from selenium import webdriver
import json
browser = webdriver.PhantomJS()
browser.get('http://www.google.fr')
har = json.loads(browser.get_log('har')[0]['message'])
har['log']['entries'][0]['response']['status']
har['log']['entries'][0]['response']['statusText']
Brightman answered 28/11, 2017 at 14:0 Comment(7)
Is there anything browser specific about "the log" or can that code work on all browsers?Dilatation
I tested it only with PhantomJS. I don't know about IE, but I think it should be possible with Chrome.Brightman
I received selenium.common.exceptions.WebDriverException: Message: unknown error: log type 'har' not foundEllipticity
@Ellipticity That means har is not defined by you. You can do capabilities['loggingPrefs'] = {'har': 'ALL'} to add it. :-)Bask
@Bask how and where in the code would you fit capabilities['loggingPrefs'] = {'har': 'ALL'}?Blithering
@MarcinKulik do from selenium.webdriver.common.desired_capabilities import DesiredCapabilities and you can create a variable capabilities = DesiredCapabilities.FIREFOX now you can use this to modify the capabilities of the browser. Hope it helps. This may serve as an example: github.com/openwisp/docker-openwisp/blob/…Bask
For the Chrome driver, I added in: capabilities = DesiredCapabilities.CHROME capabilities['goog:loggingPrefs'] = {'har': 'ALL'} driver = webdriver.Chrome('./chromedriver', options=options, desired_capabilities=capabilities) But I still get a log type 'har' not found error.Varden
A
3

In order to get a status code from url using Selenium you can use a javascript and XMLHttpRequest object. WebDriver class has a execute_async_script() method and you can call it to execute a javascript code within the browser:

from selenium import webdriver

driver = webdriver.Chrome(executable_path="C:\ChromeDriver\chromedriver.exe")
driver.get('https://stackoverflow.com/')

js = '''
let callback = arguments[0];
let xhr = new XMLHttpRequest();
xhr.open('GET', 'https://stackoverflow.com/', true);
xhr.onload = function () {
    if (this.readyState === 4) {
        callback(this.status);
    }
};
xhr.onerror = function () {
    callback('error');
};
xhr.send(null);
'''

status_code = driver.execute_async_script(js)
print(status_code)    # 200

driver.close()

More information about execute_async_script method.

Amery answered 23/1, 2020 at 9:30 Comment(1)
It seems good for GET method. But is there any way to check the response code for a form submit in a page which use POST method?Clean
E
2

You can also inspect the last message in the log for an error status code: print browser.get_log('browser')[-1]['message']

Ellipticity answered 13/12, 2018 at 18:21 Comment(0)
M
1

Don't ever say anything isn't possible. The top-voted answer is horrible. There are many other answers that lead to possible solutions, but I will share how I personally implemented this, which is based off of another Stack Overflow answer.

Tested using Google Chrome. The specifics for Firefox or PhantomJS may be a bit different.

I created a method for checking the response status code for any URL that you have visited. I'm sure that it could possibly cleaned up, but it works:

import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

capabilities = DesiredCapabilities.CHROME
capabilities['goog:loggingPrefs'] = {'performance': 'ALL'}

driver = webdriver.Chrome(desired_capabilities=capabilities)


def get_status_code(url):
    for entry in driver.get_log('performance'):
        for k, v in entry.items():
            if k == 'message' and 'status' in v:
                msg = json.loads(v)['message']['params']
                for mk, mv in msg.items():
                    if mk == 'response':
                        response_url = mv['url']
                        response_status = mv['status']
                        if response_url == url:
                            return response_status


print(get_status_code(driver.current_url))

Output:

200

Mesocarp answered 28/10, 2021 at 16:53 Comment(0)
P
1

In the meantime, there is a library in python called selenium-wire

pip install selenium-wire

It will let you do this for example:

from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options

url = request.POST.get('https://stackoverflow.com', None)
driver = webdriver.Chrome()
driver.get(url)

for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type']
        )
Puett answered 13/9, 2022 at 18:49 Comment(0)
D
0

I'm using java here as I haven't got much experience in Python. Also, I don't know how to get only the http status codes. Following will give you the entire network traffic, you can capture status codes from it.

First start your server as

selenium.start("captureNetworkTraffic=true");

Then capture your trafic as

String traffic = selenium.captureNetworkTraffic("xml");

You can get output in json as well.

Dinnie answered 27/4, 2011 at 6:35 Comment(0)
B
0

I've been surfing the net for about 3 hours and I found not a single way to do that with web-driver. I'v not ever worked with selenium directly. The only suggestion that came in my mind is to use module "requests" like this:

import requests
from selenium import webdriver

driver = webdriver.get("url")
r = requests.get("url")
print r.status_code

Complete tutorial about using requests is here and you can install the module using the command pip install requests.

But there is a problem that may not always happen, but you should focus that driver's response and request's response are not the same; so you just get the request's status code and if the url responses are not stable it probably causes wrong results.

Borodino answered 8/10, 2013 at 11:16 Comment(5)
So problematic. On top of all those problems, you GET the request twice, and further, this can't be used if you plan on having Selenium GET or POST urls.Norsworthy
Thanks for nice answer. pypi.python.org/pypi/selenium-requests also does the same stuff.Knit
Unfortunately, some websites will give different response codes to these different modules. Something to be aware of (try user-agent spoofing if you're relying on requests for status codes used in Selenium)Tyburn
Actually, it's a good idea to use an HTTP request and determine whether the URL is valid or not.Patch
When a page blocks direct requests and only allows loading the page via a browser, you get 403, so not helpful in this caseFrenulum
Q
0
from seleniumwire import webdriver
link_project = "url"
driver.get(link_project)
status_url = ""
for request in driver.requests:
    if request.response:
        link_res_project = request.url
        if link_res_project == link_project:
            status_url = int(request.response.status_code)
            print(f"Status code: {status_url}")
Quincuncial answered 30/1 at 15:33 Comment(0)
F
-2

YOU CAN GET STATUS CODE FROM THE TITLE

For example, 403 Forbidden response from nginx.

<html>
    <head>
        <title>403 Forbidden</title>
    </head>
    <body></body>
</html>

Selenium code:

text = driver.find_element_by_tag_name('title').text
if '403 Forbidden' in text:
    print('[INFO] status code is 403')

Ofcourse, this decision does not cover all the cases.

Fulviah answered 12/12, 2019 at 15:20 Comment(1)
No. How would you go about "nenalezeno", "página no encontrada", "страница не найдена", and so on...? Even if you're lucky enough that the title states the error at all. And even better, how about a (found) article titled "How to write the best microcopy for 403 Forbidden pages"?Cordiality
O
-2

I used the following trick by using requests to make sure that server is responding first. Then I used driver:

resp = requests.get(link)
while resp.status_code != 200:
    resp = requests.get(link)
    if resp.status_code == 200:
        break

html = driver.page_source

soup = BeautifulSoup(html)
Oogenesis answered 17/8, 2020 at 2:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.