How to get browser network logs using python selenium
Asked Answered
P

7

11

I'm trying to get browser network logs using selenium to debug request/responses. Could you please help me to find out a way.

And I'm using selenium 3.14.0 and latest Chrome browser.

Portrait answered 13/11, 2018 at 17:49 Comment(1)
For future readers, since this question is one of the first that comes up when trying to find the answer, selenium-wire is what you are looking for. That way you don't need a proxy. Just thought I could save you some time searching. One more note, use request.response.body, not request.body. I had to experiment to see why my body was empty and found out the documentation needs to be updated. :)Dean
A
24

Using python + selenium + firefox

Don't set up a proxy unless you have to- in order to get outbound API requests I used the solution from this answer, but in python: https://mcmap.net/q/443849/-using-selenium-get-network-request-not-working-properly-closed

test = driver.execute_script("var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return network;")

for item in test:
  print(item)

You get an array of dicts.

This allows me to see all the network requests made. I'm using it to parse out a parameter from one of the requests so that I can use it to make my own requests against the API.

Using python + selenium + Chrome

EDIT: this answer got a lot of attention, here is how I'm doing it now w/Chrome (taken from undetected-chromedriver code):

chrome_options = webdriver.ChromeOptions()
chrome_options.set_capability(
                        "goog:loggingPrefs", {"performance": "ALL", "browser": "ALL"}
                    )
driver = webdriver.Chrome(options=chrome_options)


##visit your website, login, etc. then:
log_entries = driver.get_log("performance")

for entry in log_entries:

    try:
        obj_serialized: str = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        if method in ['Network.requestWillBeSentExtraInfo' or 'Network.requestWillBeSent']:
            try:
                for c in message['params']['associatedCookies']:
                    if c['cookie']['name'] == 'authToken':
                        bearer_token = c['cookie']['value']
            except:
                pass
        print(type(message), method)
        print('--------------------------------------')
    except Exception as e:
        raise e from None

With this method you can parse out tokens, api keys, etc. that your browser is sending to the server.

Admissive answered 2/1, 2021 at 11:26 Comment(3)
['Network.requestWillBeSentExtraInfo' or 'Network.requestWillBeSent'] evaluates to ['Network.requestWillBeSentExtraInfo']. If the second element of this list is actually useful then this code doesn't do what is expected. If it's not then you should omit it.Kolivas
This code does exactly what I'm expecting it to do: I'm listening to the network requests the browser is sending, which includes things like API keys, tokens, etc. and parsing those tokens out so I can use them.Admissive
This code doesn't extract tokens from messages with method equal to 'Network.requestWillBeSent' because it is not in the list of methods to be parsed.Kolivas
S
6

Using Python and ChromeDriver

To get network logs, you need to install BrowserMobProxy as well along with selenium in python

pip install browsermob-proxy

Then we need to download the browsermobproxy zip from https://bmp.lightbody.net/.

Unzip it to any folder(For e.g. path/to/extracted_folder). This folder contains the browsermob-proxy binary file. We need to mention this path while calling Server() in python code

You need to start browser proxy and configure the proxy in chrome option of chrome driver,

from browsermobproxy import Server
from selenium import webdriver

server = Server("path/to/extracted_folder/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

# Configure the browser proxy in chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))
browser = webdriver.Chrome(chrome_options = chrome_options)

#tag the har(network logs) with a name
proxy.new_har("google")

Then you can navigate to page using selenium

browser.get("http://www.google.co.in")

After navigation, you can get the network logs in json format from the proxy

print(proxy.har) # returns a Network logs (HAR) as JSON 

Also before quitting the driver, stop the proxy server as well at the end,

server.stop()
browser.quit()
Sand answered 13/11, 2018 at 18:38 Comment(2)
Tried this and got Keyerror: port Then trying again without changing anything led to OSError: [WinError 6] The handle is invalidCapua
If you are getting error about path ... try this #55987205Necropolis
F
2

Try selenium-wire: I think this is a better way which also provides undetected-chromedriver against bot detection.

Use like this

# pip install selenium-wire
from seleniumwire import webdriver  # Import from seleniumwire

# Create a new instance of the Chrome driver
driver = webdriver.Chrome()

# Go to the Google home page
driver.get('https://www.google.com')

# Access requests via the `requests` attribute
for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type']
        )
Firdausi answered 26/4, 2021 at 10:14 Comment(1)
pypi.org/project/selenium-wire is the most simple / easy to use solution as it extended Selenium. @anand-s please accept this answer.Gastrotrich
P
0

I'm using selenium 4.11 but the following may help.

import json
from selenium import webdriver

# Initialize Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Navigate to the target website
driver.get("https://your-website.com")

# Capture network log entries
log_entries = driver.get_log("performance")

# Initialize variables to store the last known URL
last_known_url = None

# Initialize lists to store request and response headers
request_headers_data = []
response_headers_data = []

for entry in log_entries:
    try:
        obj_serialized = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")

        # Update last known URL if available
        if url:
            last_known_url = url

        if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
            try:
                request_payload = message['params'].get('request', {})
                request_headers = request_payload.get('headers', {})
                # Store request headers and last known URL in request_headers_data
                request_headers_data.append({"url": last_known_url, "headers": request_headers})
            except KeyError:
                pass

        if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
            try:
                response_payload = message['params'].get('response', {})
                response_headers = response_payload.get('headers', {})
                # Store response headers and last known URL in response_headers_data
                response_headers_data.append({"url": last_known_url, "headers": response_headers})
            except KeyError:
                pass

        if method == 'Network.loadingFinished':
            # Network request is finished, you can now access request_headers_data and response_headers_data
            print("Request Headers:")
            for request_data in request_headers_data:
                print("URL:", request_data["url"])
                print(request_data["headers"])
            print("Response Headers:")
            for response_data in response_headers_data:
                print("URL:", response_data["url"])
                print(response_data["headers"])
            print('--------------------------------------')
    except Exception as e:
        raise e from None

# Close the WebDriver
driver.quit()
Paschasia answered 9/9, 2023 at 10:13 Comment(0)
P
0

Or if you want sequentially. So each request followed by it's response.

import json
from selenium import webdriver

# Initialize Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Navigate to the target website
driver.get("https://your-website.com")

# Capture network log entries
log_entries = driver.get_log("performance")

# Initialize dictionaries to store request and response headers
request_headers_data = []
response_headers_data = []
last_known_url = None  # To keep track of the URL associated with the latest entry

for entry in log_entries:
    try:
        obj_serialized = entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")

        if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
            try:
                request_payload = message['params'].get('request', {})
                request_headers = request_payload.get('headers', {})
                # Store request headers and last known URL in request_headers_data
                request_headers_data.append({"url": url, "headers": request_headers})
                last_known_url = url
            except KeyError:
                pass

        if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
            try:
                response_payload = message['params'].get('response', {})
                response_headers = response_payload.get('headers', {})
                # Store response headers and last known URL in response_headers_data
                response_headers_data.append({"url": url, "headers": response_headers})
                last_known_url = url
            except KeyError:
                pass

    except Exception as e:
        raise e from None

# Iterate through the headers sequentially
for request_headers, response_headers in zip(request_headers_data, response_headers_data):
    print("Request URL:", request_headers["url"])
    print("Request Headers:", request_headers["headers"])
    print("Response URL:", response_headers["url"])
    print("Response Headers:", response_headers["headers"])
    print('--------------------------------------')

# Close the WebDriver
driver.quit()
Paschasia answered 9/9, 2023 at 11:1 Comment(1)
I don't believe zipping those two lists will line up properly, will they? There is no guaruntee the messages are in order. It's possible to get a responseReceivedExtraInfo before or after the corresponding responseReceived.Runion
H
-1

For the latest python selenium version 4.1.0, webdriver.get_log(self, log_type) only have 4 type logs

driver.get_log('browser')
driver.get_log('driver')
driver.get_log('client')
driver.get_log('server')

can't get performace log by driver.get_log function

Honor answered 29/11, 2021 at 9:11 Comment(0)
A
-4

To get only the network logs up until the page has finished loading (no ajax/async network logs during the main usage of the page), you can get the Performance Log: http://chromedriver.chromium.org/logging/performance-log

To enable the Performance Logging for the ChromeDriver, for example,

DesiredCapabilities cap = DesiredCapabilities.chrome();
LoggingPreferences logPrefs = new LoggingPreferences();
logPrefs.enable(LogType.PERFORMANCE, Level.ALL);
cap.setCapability(CapabilityType.LOGGING_PREFS, logPrefs);
RemoteWebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:9515"), cap);

The chromium performance-log page also links to this complete example https://gist.github.com/klepikov/5457750 which has Java and python code to get the Performance Logs.

Again, it's important to keep in mind that this will only get the network requests up until the point that the page is finished loading. After that, the driver will only return the same performance logs until the page reloads.


If you want to get network logs asynchronously throughout the usage of the page, you can use BrowserMobProxy to act as a proxy server for your Selenium driver and capture all those network requests. Then, you can get those captured requests from BrowserMobProxy's generated HAR file: https://github.com/lightbody/browsermob-proxy#using-with-selenium

// start the proxy
BrowserMobProxy proxy = new BrowserMobProxyServer();
proxy.start(0);

// get the Selenium proxy object
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);

// configure it as a desired capability
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability(CapabilityType.PROXY, seleniumProxy);

// start the browser up
WebDriver driver = new FirefoxDriver(capabilities);

// enable more detailed HAR capture, if desired (see CaptureType for the complete list)
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT);

// create a new HAR with the label "yahoo.com"
proxy.newHar("yahoo.com");

// open yahoo.com
driver.get("http://yahoo.com");

// get the HAR data
Har har = proxy.getHar();

Once you have the HAR file, it is a JSON like list of network events that you can work with.

Actuary answered 13/11, 2018 at 18:5 Comment(1)
The first example is for java.Need example for python.Josiejosler

© 2022 - 2024 — McMap. All rights reserved.