how to save opened page as pdf in Selenium (Python)
Asked Answered
B

6

29

Have tried all the solutions I could find on the Internet to be able to print a page that is open in Selenium in Python. However, while the print pop-up shows up, after a second or two it goes away, with no PDF saved.

Here is the code being tried. Based on the code here - https://mcmap.net/q/502266/-selecting-every-options-from-a-drop-down-box-using-selenium-python

Coding on a Mac with Mojave 10.14.5.

from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import WebDriverException
import time
import json

options = Options()
appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
# profile = {'printing.print_preview_sticky_settings.appState':json.dumps(appState),'savefile.default_directory':downloadPath}
options.add_experimental_option('prefs', profile)
options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'

driver = webdriver.Chrome(options=options, executable_path=CHROMEDRIVER_PATH)
driver.implicitly_wait(5)
driver.get(url)
driver.execute_script('window.print();')
$chromedriver --v
ChromeDriver 75.0.3770.90 (a6dcaf7e3ec6f70a194cc25e8149475c6590e025-refs/branch-heads/3770@{#1003})

Any hints or solutions as to what can be done to print the open html page to a PDF. Have spent hours trying to make this work. Thank you!


Update on 2019-07-11:

My question has been identified as a duplicate, but a) the other question seems to be using javascript code, and b) the answer does not solve the problem being raised in this question - it may be to do with more recent software versions. Chrome version being used is Version 75.0.3770.100 (Official Build) (64-bit), and chromedriver is ChromeDriver 75.0.3770.90. On Mac OS Mojave. Script is running on Python 3.7.3.

Update on 2019-07-11:

Changed the code to

from selenium import webdriver
import json

chrome_options = webdriver.ChromeOptions()
settings = {
    "appState": {
        "recentDestinations": [{
            "id": "Save as PDF",
            "origin": "local",
            "account": "",
        }],
        "selectedDestinationId": "Save as PDF",
        "version": 2
    }
}
prefs = {'printing.print_preview_sticky_settings': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=CHROMEDRIVER_PATH)
driver.get("https://google.com")
driver.execute_script('window.print();')
driver.quit()

And now, nothing happens. Chrome launches, loads url, print dialog appears but then nothing seems to happen - nothing in the default printer queue, and no pdf either - I even searched for the PDF files by looking up "Recent Files" on Mac.

Benn answered 5/7, 2019 at 5:32 Comment(10)
no PDF saved, where did you check? It should be saved in your user Downloads folder.Hugohugon
@Hugohugon - I tried this again, and noticed that Chrome was firing an actual printout on my default printer but I was not in the same location, so I did not notice what actually happened. deleted the print queue from the numerous times that I had tried printing to pdf/ appeared that nothing happened. so I suspect that the "Save as PDF" option is not getting selected and do not know how to select it.Benn
Please refer to this answer. In your code, you are calling webdriver.Chrome(options=options.., but correct syntax is webdriver.Chrome(chrome_options=options... And somehow, with webdriver.ChromeOptions print is working faster than with webdriver.chrome.options.Options, so I would suggest you to try that.Hugohugon
Possible duplicate of Set Selenium ChromeDriver UserPreferences to Save as PDFHugohugon
@Hugohugon - Thank you for your comments. I just tried that also. changed the code to chrome_options = webdriver.ChromeOptions(). And indeed webdriver.ChromeOptions indeed seems to work faster, but even this option fires a printout to default printer and not to PDF :( Still looking for advise as to how this can be done - if not with Selenium then I wonder if it is possible with some other library. However, the page that I need to reach is after a login procedure.Benn
The code on other question works for me, so can you please update your question with latest code you tried?Hugohugon
Updated the question with the latest code that I used. This time nothing seems to go anywhere even though the print dialog does appear to launch. The print dialog is too quick and cannot read what printer or whether the PDF option is selected. Intrigued. Thanks @Hugohugon for staying engaged and helping me solve this.Benn
How do you mean solved? The only thing your script does for me is open "save as". It doesn't actually save it itself.Kelsy
Oh sorry never mind. Instead of calling the correct chromedriver I used this. driver = webdriver.Chrome(ChromeDriverManager().install()) but it ruined everything. now I explicitly used driver = webdriver.Chrome(chrome_options=chrome_options , executable_path="/Applications/chrome/chromedriver") and it works!Kelsy
@GregW.F.R glad it worked. I have not used this in a long time. But yes that is the way to instantiate a chrome driver instance.Benn
H
27

The answer here, worked when I did not have any other printer setup in my OS. But when I had another default printer, this did not work.

I don't understand how, but making small change this way seems to work.

from selenium import webdriver
import json

chrome_options = webdriver.ChromeOptions()
settings = {
       "recentDestinations": [{
            "id": "Save as PDF",
            "origin": "local",
            "account": "",
        }],
        "selectedDestinationId": "Save as PDF",
        "version": 2
    }
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=CHROMEDRIVER_PATH)
driver.get("https://google.com")
driver.execute_script('window.print();')
driver.quit()
Hugohugon answered 18/7, 2019 at 8:22 Comment(5)
Thank you @Kamal. This approach indeed works but it printed to the last used printer. Just did some search and I wonder if cups-pdf installed as a printer and if cups-pdf is the last used printer can result in the desired outcome - print-to-pdf using python.Benn
Sorry I couldn't test my solution on Linux, it worked on Windows 10 for me.Hugohugon
got it. Will work on this some more and see if I can come up with something.Benn
Worked on Linux for me. Would be nice if we could control the download location, however.Roebuck
@RobHall The solution https://mcmap.net/q/502267/-selenium-chrome-save-as-pdf-change-download-folderHimyarite
P
9

You can use the following code to print PDFs in A5 size with background css enabled:

import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import json
import time

chrome_options = webdriver.ChromeOptions()

settings = {
    "recentDestinations": [{
        "id": "Save as PDF",
        "origin": "local",
        "account": ""
    }],
    "selectedDestinationId": "Save as PDF",
    "version": 2,
    "isHeaderFooterEnabled": False,
    "mediaSize": {
        "height_microns": 210000,
        "name": "ISO_A5",
        "width_microns": 148000,
        "custom_display_name": "A5"
    },
    "customMargins": {},
    "marginsType": 2,
    "scaling": 175,
    "scalingType": 3,
    "scalingTypePdf": 3,
    "isCssBackgroundEnabled": True
}

mobile_emulation = { "deviceName": "Nexus 5" }
chrome_options.add_experimental_option("mobileEmulation", mobile_emulation)
chrome_options.add_argument('--enable-print-browser')
#chrome_options.add_argument('--headless')

prefs = {
    'printing.print_preview_sticky_settings.appState': json.dumps(settings),
    'savefile.default_directory': '<path>'
}
chrome_options.add_argument('--kiosk-printing')
chrome_options.add_experimental_option('prefs', prefs)

for dirpath, dirnames, filenames in os.walk('<source path>'):
    for fileName in filenames:
        print(fileName)
        driver = webdriver.Chrome("./chromedriver", options=chrome_options)
        driver.get(f'file://{os.path.join(dirpath, fileName)}')
        time.sleep(7)
        driver.execute_script('window.print();')
        driver.close()
Planar answered 25/11, 2020 at 13:3 Comment(2)
This solution worked great for me. savefile.default_directory takes both forward and backslash paths (on Windows 10). However, this fails more often than it succeeds for me because the browser closes before the file is fully written. This can be solved by adding sleep(5) before driver.close() or some more intelligent structure.Dereism
It seems like headless is commented out, and with headless on it doesn't work. Any idea how to make it work in a headless browser?Ironwork
S
6

Here is the solution I use with Windows :

  • First download the ChromeDriver here : http://chromedriver.chromium.org/downloads and install Selenium

  • Then run this code (based on the accepted answer, slightly modified to work on Windows):

    import json
    from selenium import webdriver
    chrome_options = webdriver.ChromeOptions()
    settings = {"recentDestinations": [{"id": "Save as PDF", "origin": "local", "account": ""}], "selectedDestinationId": "Save as PDF", "version": 2}
    prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings)}
    chrome_options.add_experimental_option('prefs', prefs)
    chrome_options.add_argument('--kiosk-printing')
    browser = webdriver.Chrome(r"chromedriver.exe", options=chrome_options)
    browser.get("https://google.com/")
    browser.execute_script('window.print();')
    browser.close()    
    
Selfannihilation answered 3/4, 2020 at 13:28 Comment(4)
This is such a minimal revision ("Per the selenium documentation, specify the windows driver locations (e.g., chromedriver.exe) rather than the linux driver locations when running on windows") that it should simply be a comment on the accepted answer. Furthermore, It appears that you simply minified the accepted answer to make the code look different.Roebuck
@RobHall Comments are sometimes cleared after years; also sometimes it's hard to extract information from multiple comments, thus this answer. I properly cited the source ("based on the accepted answer"); the devil is really in the details, I spent a lot of time trying and failing before it finally worked, so my goal was really to put a ready-to-use code for Windows as an answer.Selfannihilation
I tried searching for the saved file but can't find it anywhere. Any idea where the file goes after being saved as pdf.Sceptre
the saved file would be in downloads, does anyone know if I can add a delay for the web to load properly or if can change the default download location?Hoagy
B
3

The solution is not very good, but you can take a screenshot and convert to pdf by Pillow...

from selenium import webdriver
from io import BytesIO
from PIL import Image

driver = webdriver.Chrome(executable_path='path to your driver')
driver.get('your url here')
img = Image.open(BytesIO(driver.find_element_by_tag_name('body').screenshot_as_png))
img.save('filename.pdf', "PDF", quality=100)
Brisance answered 18/7, 2019 at 14:15 Comment(9)
Thank you for your answer. The issue with this approach is that it does not work for multi-page webpages. Only a portion of information is captured. But it is a good solution for short pages and does not entail popups.Benn
what do you mean by multi-page webpages?Brisance
meaning web pages that need scrolling to see the complete webpage and when printed as PDF fit on 3-4 sheets of papers.Benn
you can use this code: https://mcmap.net/q/75735/-take-screenshot-of-full-page-with-selenium-python-with-chromedriver , and at the end save as pdf. P.s. I didn't understand a bit, sorry. Do you want to fit the entire page on 1 sheet when printing? or howBrisance
so what I ideally want to be able to do - is print a page as pdf. on a Mac, when you do that, the PDF generated can run into many pages - assuming PDF is created for letter or A4 sized printing. if I shrink the page a lot and take a screenshot that does not serve the purpose. although, now I understand that Selenium does not control the dialog boxes of the browser, and hence cannot print page as PDF. apparently, puppeteer or pyppeteer in python can do that but I do not know how to use that software yet. the link you shared, seems to talk about screenshot and not pdf...Benn
you can save screenshot as pdf, why not?Brisance
I can. But the page that I wanted to save runs into many screens of vertical scrolling. And so it would become multiple and page downs and then converting each screenshot to PDF and then combining the PDFs. Just thought of this based on your comment. Still seems rather kludgy, and I was hoping that there will be a better solution. Pyppeteer might allow me to do it in Python it seems, but I do not know how to use that. :(Benn
I think this would solve my problem. miyakogi.github.io/pyppeteer/_modules/pyppeteer/…. However, I do not know async and await, and need to learn those before trying to use Pyppeteer. Just hard to believe that Selenium could not do it as I had sort of learnt it...Benn
I won't say I am upset. :) Selenium is free and thanks to the team for that! It is just that I was certain that it could be done and believed that I did not know the right syntax or options as to how to enable PDF printing in Selenium. Using kludges does not seem like the right thing. It will eventually break.Benn
T
0

You can try to use the selenium-print package.

It uses selenium's execute_cdp_cmd function behind the scenes, which is fairly easy to use. The parameters can be found here.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
options = webdriver.ChromeOptions()
service = Service()
driver = webdriver.Chrome(service=service, options=options)
driver.get('http://localhost:3000')
time.sleep(2)
pdf = driver.execute_cdp_cmd("Page.printToPDF", {"printBackground": True})
pdf_data = base64.b64decode(pdf["data"])
with open("test.pdf", "wb") as f:
    f.write(pdf_data)
Transferase answered 29/3 at 14:15 Comment(0)
D
-6

I would suggest Downloading the page source html which can be done like so in vb.net:

Dim Html As String = webdriver.PageSource

Not sure how it is done in python but I'm sure it's very similar Once you have done that then you can select the parts of the page you want to save using an html parser or by parsing it manually with string parsing code. Once you have the html for the part you want to save stored in a string then use an html to pdf converter library or program. There are lots of these for programming languages like C# and vb.net. I don't know about any for python but I'm sure some exist. Just do some research. (some are free and some are expensive)

Diondione answered 29/4, 2021 at 5:10 Comment(1)
I've been using the converter approach and it is not great. The most common converter, wkhtmltopdf, lives in the 13th century, so either you put your medieval armour, forget all about flex and grid and go back to <table> layouting or you'll get zilch. Alternatives are even worse. Speaking of the 13th century, vb.net?!? In general, I don't hold a candle for two types of SO responses: 1) "Here's something I threw together and never actually tried. Good luck!", and 2) "Why would you want to do that?". Yours is type 1. Not as bad as type 2, but still a time waster.Emarie

© 2022 - 2024 — McMap. All rights reserved.