How to download PDF files with Playwright? (Python)

Asked 16/7, 2021 at 12:46 Answered 22/1, 2023 at 15:2

Solved python playwright playwright-python

I'm trying to automate the download of a PDF file using Playwright, I've the code working with Selenium, but some features in Playwright got my attention. The real problem the documentation isn't helpful. When I click on download I get this:

And I cant change the directory of the download, it also delete the "file" when the browser/context are closed. Using Playwright I can achieve a nice download automation?

Code:

def run(playwright):
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context(accept_downloads=True)

    # Open new page
    page = context.new_page()

    # Go to http://xcal1.vodafone.co.uk/
    page.goto("http://xcal1.vodafone.co.uk/")

    # Click text=Extra Small File 5 MB A high quality 5 minute MP3 music file 30secs @ 2 Mbps 10s >> img
    with page.expect_download() as download_info:
        page.click("text=Extra Small File 5 MB A high quality 5 minute MP3 music file 30secs @ 2 Mbps 10s >> img")
    download = download_info.value
    path = download.path()
    download.save_as(path)
    print(path)

    # ---------------------
    context.close()
    browser.close()

with sync_playwright() as playwright:
    run(playwright)

Weatherglass answered 16/7, 2021 at 12:46 Comment(0)

The download.path() in playwright is just a random GUID (globally unique identifier). It's designed to validate the download works - not to keep the file.

Playwright is a testing tool and imagine running tests across every major browser on every code change - any downloads would quickly take up a lot of space and it would hack people off if you need to manually clear them out.

Good news is you are very close - If you want to keep the file you just need to give the file a name in the save_as.

instead of this:

   download.save_as(path)

use this:

   download.save_as(download.suggested_filename)

That saves the file in the same location as the script.

Keeleykeelhaul answered 4/8, 2021 at 20:22 Comment(2)

I disagree with the notion that "playwright is a testing tool". It is a browser automation tool ("playwright") as well as a testing tool ("@playwright/test" ). Thanks for the answer! 👍 – Counterproposal 24/1, 2022 at 12:28

Could you please elaborate on what is the correct syntax to save the file in other directory? Would it work if instead of suggested_filename a path for download to be saved was indicated? – Goodyear 20/4, 2022 at 17:51

You can save at any location with download.save_as(path)

This worked for me.

from pathlib import Path

...
download.save_as(Path.home().joinpath('Downloads', download.suggested_filename))

Wrasse answered 31/7, 2022 at 11:13 Comment(0)

Its good for me:

url = config.url  # your file url
response = await page_request.get(url, params={'id': file_id})  #your request
file = await response.body()  # Downloaded file before save
file_name = filename.pdf  #  filename to be saved
open(file_name, 'wb').write(file)
print(f'File {file_name} is saved')

Deepsea answered 17/10, 2022 at 11:39 Comment(1)

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Triacid 19/10, 2022 at 16:25

When I tried a similar code, I got the error:

playwright._impl._api_types.Error: net::ERR_ABORTED at https://www.africau.edu/images/default/sample.pdf
=========================== logs ===========================
navigating to "https://www.africau.edu/images/default/sample.pdf", waiting until "load"
============================================================

In retrospect, it's likely because of the fact that I have set my playwright.chromium.launch_persistent_context(user_dir) to "always_open_pdf_externally:true" as in this example: https://github.com/microsoft/playwright/issues/3509 In stead, what I needed to do was to use a try/except method like such:

    async with page.expect_download() as download_info:
        try:
            await page.goto("https://www.africau.edu/images/default/sample.pdf", timeout= 5000)
        except:
            print("Saving file to ", downloads_path, file_name)
            download = await download_info.value
            print(await download.path())
            await download.save_as(os.path.join(downloads_path, file_name))
        await page.wait_for_timeout(200)

Maybe this helps someone. It seems there isn't a clean method for this, yet: https://github.com/microsoft/playwright/issues/7822

Jay answered 22/1, 2023 at 15:2 Comment(0)

Recommended topics

Hot tags