How do you open multiple pages asynchronously with Playwright Python?
Asked Answered
S

2

8

I want to open multiple urls at once using Playwright for Python. But I am struggling to figure out how. This is from the async documentation:

async def main():
    async with async_playwright() as p:
        for browser_type in [p.chromium, p.firefox, p.webkit]:
            browser = await browser_type.launch()
            page = await browser.newPage()
            await page.goto("https://scrapingant.com/")
            await page.screenshot(path=f"scrapingant-{browser_type.name}.png")
            await browser.close()

asyncio.get_event_loop().run_until_complete(main())

This opens each browser_type sequentially. How would I go about it if I wanted to do it in parallel? And how would I go about it if I wanted to do something similar with a list of urls?

I tried doing this:

urls = [
    "https://scrapethissite.com/pages/ajax-javascript/#2015",
    "https://scrapethissite.com/pages/ajax-javascript/#2014",
]
async def main(url):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        page = await browser.newPage()
        await page.goto(url)
        await browser.close()

async def go_to_url():
    tasks = [main(url) for url in urls]
    await asyncio.wait(tasks)

go_to_url()

But that gave me the following error:

92: RuntimeWarning: coroutine 'go_to_url' was never awaited
  go_to_url()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Stent answered 3/11, 2020 at 14:8 Comment(0)
G
1

I believe you need to call your go_to_url function using the same recipe:

asyncio.get_event_loop().run_until_complete(go_to_url())
Gent answered 3/11, 2020 at 14:24 Comment(0)
S
2

I struggled to get the original code to work, even with @hardkoded 's answer. Using Python 3.11, I find the following code to work. I open each url in the same context, to open only 1 browser window.

import asyncio
from playwright.async_api import async_playwright

urls = [
    "https://scrapethissite.com/pages/ajax-javascript/#2015",
    "https://scrapethissite.com/pages/ajax-javascript/#2014",
    "https://scrapethissite.com/pages/ajax-javascript/#2013",
]

async def get_detail(context, url):
    page = await context.new_page()
    await page.goto(url) 
    await page.wait_for_load_state(state="networkidle")
    await page.wait_for_timeout(1000)
    page.close

async def open_new_pages(context, urls):
    # Creating tasks: https://docs.python.org/3.11/library/asyncio-task.html#creating-tasks
    background_tasks = set()
    for url in urls:
        task = asyncio.create_task(
            get_detail(context, url)
        )
        background_tasks.add(task)
        # task.add_done_callback(background_tasks.discard)
        # Above produced an error for me since the set then gets changed while the loop is running.
    
    #Awaiting for each of the tasks:
    for t in background_tasks:
        await t

    
async def main(urls):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        context = await browser.new_context()
        await open_new_pages(context, urls)

asyncio.run(main(urls))

I believe Taskgroups (https://docs.python.org/3.11/library/asyncio-task.html#task-groups) are more updated, especially in Python 3.11:

import asyncio
from playwright.async_api import async_playwright

urls = [
    "https://scrapethissite.com/pages/ajax-javascript/#2015",
    "https://scrapethissite.com/pages/ajax-javascript/#2014",
    "https://scrapethissite.com/pages/ajax-javascript/#2013",
]

async def get_detail(context, url):
    page = await context.new_page()
    await page.goto(url) 
    await page.wait_for_load_state(state="networkidle")
    await page.wait_for_timeout(1000)
    page.close

async def open_new_pages(context, urls):
    async with asyncio.TaskGroup() as tg:
        for url in urls:
            task = tg.create_task(
                get_detail(context, url)
            )    
    
async def main(urls):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        context = await browser.new_context()
        await open_new_pages(context, urls)

asyncio.run(main(urls))
Singe answered 20/12, 2022 at 22:23 Comment(1)
page.close needs parentheses to work: page.close().Karr
G
1

I believe you need to call your go_to_url function using the same recipe:

asyncio.get_event_loop().run_until_complete(go_to_url())
Gent answered 3/11, 2020 at 14:24 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.