I am writing a selenium script by python, but I think I don't see any information about:
How to get http status code from selenium Python code.
Or I missing something. If anyone found that, please feel free to post.
I am writing a selenium script by python, but I think I don't see any information about:
How to get http status code from selenium Python code.
Or I missing something. If anyone found that, please feel free to post.
Unfortunately, Selenium does not provide this information by design. There is a very lengthy discussion about this, but the short of it is that:
We're left with hacks like:
I do not have much experience with python. I have a more detailed java example here:
https://mcmap.net/q/159642/-how-to-get-http-response-code-using-selenium-webdriver
The idea is to enable Performance logging. This is triggering "Network.enable" on chromedriver. Then get the Performance log entries and parse them for "Network.responseReceived" message.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# enable browser logging
d = DesiredCapabilities.CHROME
d['loggingPrefs'] = { 'performance':'ALL' }
driver = webdriver.Chrome(executable_path="c:\\windows\\chromedriver.exe", service_args=["--verbose", "--log-path=D:\\temp3\\chromedriverxx.log"], desired_capabilities=d)
driver.get('https://api.ipify.org/?format=text')
print(driver.title)
print(driver.page_source)
performance_log = driver.get_log('performance')
print (str(performance_log).strip('[]'))
for entry in driver.get_log('performance'):
print (entry)
The output will contain "Network.responseReceived" for your url, other requests that are done by the page load, or redirect urls. All you have to do is parse the log entries.
'{"message":{"method":"Network.responseReceived","params":{"frameId":"9488.1","loaderId":"9488.1","requestId":"9488.1","response":{"connectionId":14,"connectionReused":false,"encodedDataLength":-1,"fromDiskCache":false,"fromServiceWorker":false,"headers":{"Connection":"keep-alive","Content-Length":"13","Content-Type":"text/plain","Date":"Wed, 12 Oct 2016 06:15:47 GMT","Server":"Cowboy","Via":"1.1 vegur"},"headersText":"HTTP/1.1 200 OK\\r\\nServer: Cowboy\\r\\nConnection: keep-alive\\r\\nContent-Type: text/plain\\r\\nDate: Wed, 12 Oct 2016 06:15:47 GMT\\r\\nContent-Length:13\\r\\nVia:1.1vegur\\r\\n\\r\\n","mimeType":"text/plain","protocol":"http/1.1","remoteIPAddress":"54.197.246.207","remotePort":443,"requestHeaders":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8","Accept-Encoding":"gzip, deflate, sdch, br","Accept-Language":"en-GB,en-US;q=0.8,en;q=0.6","Connection":"keep-alive","Host":"api.ipify.org","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"},"requestHeadersText":"GET /?format=text HTTP/1.1\\r\\nHost: api.ipify.org\\r\\nConnection: keep-alive\\r\\nUpgrade-Insecure-Requests: 1\\r\\nUser-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36\\r\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\\r\\nAccept-Encoding: gzip, deflate, sdch, br\\r\\nAccept-Language: en-GB,en-US;q=0.8,en;q=0.6\\r\\n\\r\\n","securityDetails":{"certificateId":1,"certificateValidationDetails":{"numInvalidScts":0,"numUnknownScts":0,"numValidScts":0},"cipher":"AES_128_GCM","keyExchange":"ECDHE_RSA","protocol":"TLS 1.2","signedCertificateTimestampList":[]},"securityState":"secure","status":200,"statusText":"OK","timing":{"connectEnd":320.508999997401,"connectStart":3.08100000256673,"dnsEnd":3.08100000256673,"dnsStart":0,"proxyEnd":-1,"proxyStart":-1,"pushEnd":0,"pushStart":0,"receiveHeadersEnd":465.725000001839,"requestTime":78246.775045,"sendEnd":320.995999994921,"sendStart":320.825999995577,"sslEnd":320.435000001453,"sslStart":141.675999999279,"workerReady":-1,"workerStart":-1},"url":"https://api.ipify.org/?format=text"},"timestamp":78247.242716,"type":"Document"}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948094, 'level': 'INFO', 'message': '{"message":{"method":"Network.dataReceived","params":{"dataLength":13,"encodedDataLength":171,"requestId":"9488.1","timestamp":78247.243137}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948094, 'level': 'INFO', 'message': '{"message":{"method":"Page.frameNavigated","params":{"frame":{"id":"9488.1","loaderId":"9488.1","mimeType":"text/plain","securityOrigin":"https://api.ipify.org","url":"https://api.ipify.org/?format=text"}}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948095, 'level': 'INFO', 'message': '{"message":{"method":"Network.loadingFinished","params":{"encodedDataLength":171,"requestId":"9488.1","timestamp":78247.242066}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948115, 'level': 'INFO', 'message': '{"message":{"method":"Page.loadEventFired","params":{"timestamp":78247.264169}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948115, 'level': 'INFO', 'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"9488.1"}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 147625298116, 'level': 'INFO', 'message': '{"message":{"method":"Page.domContentEventFired","params":{"timestamp":78247.276475}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948122, 'level': 'INFO', 'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://api.ipify.org/?format=text","frameId":"9488.1","initiator":{"type":"other"},"loaderId":"9488.1","request":{"headers":{"Referer":"https://api.ipify.org/?format=text","User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"},"initialPriority":"High","method":"GET","mixedContentType":"none","url":"https://api.ipify.org/favicon.ico"},"requestId":"9488.2","timestamp":78247.280131,"type":"Other","wallTime":1476252948.11805}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}
and get "status":200 from the json response. You can also parse the response "headers".
selenium.common.exceptions.WebDriverException: Message: POST /session/4fd2b36a-6c9a-e34d-8e9a-022424c7f36f/log did not match a known command
–
Preterit import json
from selenium.webdriver.chrome.webdriver import WebDriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
chromedriver_path = "YOUR/PATH/TO/chromedriver.exe"
url = "https://selenium-python.readthedocs.io/api.html"
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['goog:loggingPrefs'] = {'performance': 'ALL'}
browser = WebDriver(chromedriver_path, desired_capabilities=capabilities)
browser.get(url)
logs = browser.get_log('performance')
Option 1: if you just want to return the status code under the assumption that the page you want the status code from... exists in the log containing 'text/html
content type
def get_status(logs):
for log in logs:
if log['message']:
d = json.loads(log['message'])
try:
content_type = 'text/html' in d['message']['params']['response']['headers']['content-type']
response_received = d['message']['method'] == 'Network.responseReceived'
if content_type and response_received:
return d['message']['params']['response']['status']
except:
pass
Usage:
>>> get_status(logs)
200
Option 2: if you wanted to see all status codes in the relevant logs
def get_status_codes(logs):
statuses = []
for log in logs:
if log['message']:
d = json.loads(log['message'])
if d['message'].get('method') == "Network.responseReceived":
statuses.append(d['message']['params']['response']['status'])
return statuses
Usage:
>>> get_status_codes(logs)
[200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]
Note 1: much of this is based on @Stefan Matei answer, however, a few things have changed between Chrome versions and I provide an idea of how to parse the logs.
Note 2: ['content-type']
Not fully reliable. Casing can change. Inspect for your use-case.
d['message']['params']['requestId']
instead of d['message']['params']['response']['headers']['content-type']
–
Genevieve I will refer you to a question I asked earlier: How to detect when Selenium loads a browser's error page
The short of it is that unless you want to get uber fancy with something like a squid proxy or browsermob, then you have to go for a dirty solution like below.
Replace
driver.get( "http://google.com" )
with
def goTo( url ):
if "errorPageContainer" in [ elem.get_attribute("id") for elem in driver.find_elements_by_css_selector("body > div") ]:
raise Exception( "this page is an error" )
else:
driver.get( url )
You can get creative and get the error code based on the text displayed in the actual browser. This will have to be customized based on the browser; the one above works for firefox.
The only way this becomes problematic is with 404's (page not found), since many sites have their own error pages and you have to customize it for each one.
It seems to be possible to get response status code from the log via API.
from selenium import webdriver
import json
browser = webdriver.PhantomJS()
browser.get('http://www.google.fr')
har = json.loads(browser.get_log('har')[0]['message'])
har['log']['entries'][0]['response']['status']
har['log']['entries'][0]['response']['statusText']
selenium.common.exceptions.WebDriverException: Message: unknown error: log type 'har' not found
–
Ellipticity har
is not defined by you. You can do capabilities['loggingPrefs'] = {'har': 'ALL'}
to add it. :-) –
Bask capabilities['loggingPrefs'] = {'har': 'ALL'}
? –
Blithering from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
and you can create a variable capabilities = DesiredCapabilities.FIREFOX
now you can use this to modify the capabilities of the browser. Hope it helps. This may serve as an example: github.com/openwisp/docker-openwisp/blob/… –
Bask capabilities = DesiredCapabilities.CHROME
capabilities['goog:loggingPrefs'] = {'har': 'ALL'}
driver = webdriver.Chrome('./chromedriver', options=options, desired_capabilities=capabilities)
But I still get a log type 'har' not found
error. –
Varden In order to get a status code from url using Selenium you can use a javascript and XMLHttpRequest
object. WebDriver
class has a execute_async_script()
method and you can call it to execute a javascript code within the browser:
from selenium import webdriver
driver = webdriver.Chrome(executable_path="C:\ChromeDriver\chromedriver.exe")
driver.get('https://stackoverflow.com/')
js = '''
let callback = arguments[0];
let xhr = new XMLHttpRequest();
xhr.open('GET', 'https://stackoverflow.com/', true);
xhr.onload = function () {
if (this.readyState === 4) {
callback(this.status);
}
};
xhr.onerror = function () {
callback('error');
};
xhr.send(null);
'''
status_code = driver.execute_async_script(js)
print(status_code) # 200
driver.close()
More information about execute_async_script method.
You can also inspect the last message in the log for an error status code:
print browser.get_log('browser')[-1]['message']
Don't ever say anything isn't possible. The top-voted answer is horrible. There are many other answers that lead to possible solutions, but I will share how I personally implemented this, which is based off of another Stack Overflow answer.
Tested using Google Chrome. The specifics for Firefox or PhantomJS may be a bit different.
I created a method for checking the response status code for any URL that you have visited. I'm sure that it could possibly cleaned up, but it works:
import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capabilities = DesiredCapabilities.CHROME
capabilities['goog:loggingPrefs'] = {'performance': 'ALL'}
driver = webdriver.Chrome(desired_capabilities=capabilities)
def get_status_code(url):
for entry in driver.get_log('performance'):
for k, v in entry.items():
if k == 'message' and 'status' in v:
msg = json.loads(v)['message']['params']
for mk, mv in msg.items():
if mk == 'response':
response_url = mv['url']
response_status = mv['status']
if response_url == url:
return response_status
print(get_status_code(driver.current_url))
Output:
200
In the meantime, there is a library in python called selenium-wire
pip install selenium-wire
It will let you do this for example:
from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options
url = request.POST.get('https://stackoverflow.com', None)
driver = webdriver.Chrome()
driver.get(url)
for request in driver.requests:
if request.response:
print(
request.url,
request.response.status_code,
request.response.headers['Content-Type']
)
I'm using java here as I haven't got much experience in Python. Also, I don't know how to get only the http status codes. Following will give you the entire network traffic, you can capture status codes from it.
First start your server as
selenium.start("captureNetworkTraffic=true");
Then capture your trafic as
String traffic = selenium.captureNetworkTraffic("xml");
You can get output in json as well.
I've been surfing the net for about 3 hours and I found not a single way to do that with web-driver. I'v not ever worked with selenium directly. The only suggestion that came in my mind is to use module "requests" like this:
import requests
from selenium import webdriver
driver = webdriver.get("url")
r = requests.get("url")
print r.status_code
Complete tutorial about using requests is here and you can install the module using the command pip install requests
.
But there is a problem that may not always happen, but you should focus that driver's response and request's response are not the same; so you just get the request's status code and if the url responses are not stable it probably causes wrong results.
from seleniumwire import webdriver
link_project = "url"
driver.get(link_project)
status_url = ""
for request in driver.requests:
if request.response:
link_res_project = request.url
if link_res_project == link_project:
status_url = int(request.response.status_code)
print(f"Status code: {status_url}")
YOU CAN GET STATUS CODE FROM THE TITLE
For example, 403 Forbidden response from nginx.
<html>
<head>
<title>403 Forbidden</title>
</head>
<body></body>
</html>
Selenium code:
text = driver.find_element_by_tag_name('title').text
if '403 Forbidden' in text:
print('[INFO] status code is 403')
Ofcourse, this decision does not cover all the cases.
I used the following trick by using requests to make sure that server is responding first. Then I used driver:
resp = requests.get(link)
while resp.status_code != 200:
resp = requests.get(link)
if resp.status_code == 200:
break
html = driver.page_source
soup = BeautifulSoup(html)
© 2022 - 2024 — McMap. All rights reserved.