Downloading a PDF using Selenium, Chrome and Python
Asked Answered
C

3

20

I tried following previous posts on this topic such as these (post 1, post 2), but I'm still stuck.

My script has to log into a site using a set of credentials, then navigate through some drop down menus to select a report. Once the report is selected, a new window pops up where parameters must be adjusted to generate the report. Once the parameters are set, the same pop up window refreshes with the generated report in PDF format and is displayed using Chrome's built in PDF viewer. I was under the impression that passing certain options to the webdriver would disable this PDF viewer and simply download the file, but the PDF viewer is still being displayed and nothing is automatically downloaded. Surely I'm missing something or I wrote something incorrectly. Here's the jist of my code:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option('prefs',  {
    "download.default_directory": download_dir,
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "plugins.plugins_disabled": ["Chrome PDF Viewer"]
    }
)

browser = webdriver.Chrome(options = chrome_options)

driver = webdriver.Chrome()
driver.get(url)

#In between here are a bunch of steps here that navigates through drop down menus

#This step may not be necessary, but I figured I'd include it to address when the pop up window refreshes and displays the report in PDF format through Chrome's PDF viewer
driver.switch_to.window(driver.window_handles[1])

So, at this point, Chrome still displays the PDF viewer even though I disabled it earlier. Nothing is downloaded, so I'm wondering if I need to provide another line of code or perhaps something else.

Using Selenium version 3.141.0, Python 3.6.4, Chrome webdriver 2.45 on Windows 10.

Cochise answered 1/1, 2019 at 20:24 Comment(1)
I had a similar problem, but, over in .Net - so I don't have a Python answer for you (thus this comment), but in general you need to pass the following command to Chrome via Selenium: Page.printToPDF. My .Net call looked like: dynamic result = new DDict(chrome.ExecuteChromeCommandWithResult("Page.printToPDF", printToPdfOpts)); then File.WriteAllBytes(pdfPath, Convert.FromBase64String(result.data)); chromedevtools.github.io/devtools-protocol/tot/Page/…Tanked
D
34

You need to replace "plugins.plugins_disabled": ["Chrome PDF Viewer"]

With:

"plugins.always_open_pdf_externally": True

Hope this helps you!

Displeasure answered 2/1, 2019 at 10:27 Comment(5)
I tried with the following changes, but it still opens up the PDF in the Chrome PDF viewer: chrome_options.add_experimental_option('prefs', { "download.default_directory": download_dir, "download.prompt_for_download": False, "plugins.always_open_pdf_externally": True } )Cochise
@sangharsh what version are you using?Displeasure
it worked. There was other problem in my code. Found and corrected. Now this solution works. The best one so far! Thanks!! I have already upvotedGanny
I'm facing the same issue. I have used the above lines of code and it still opens the PDF in a new window instead of downloading it. The report is generated through Microstrategy.Halinahalite
Hi @Halinahalite are you passing the options to the driver instance?Displeasure
W
3

I had a similar problem, which I have solved with the firefox driver in Java. Here is my code:

ffprofile.setPreference("browser.helperApps.neverAsk.saveToDisk","application/pdf");
ffprofile.setPreference("browser.download.folderList", 2);
ffprofile.setPreference("browser.download.manager.showWhenStarting", false);
ffprofile.setPreference("browser.download.dir", "path/to/directory");
ffprofile.setPreference("plugin.scan.plid.all",false);
ffprofile.setPreference("plugin.scan.Acrobat","99.0");
ffprofile.setPreference("pdfjs.disabled",true);

Maybe for you it is an option to use Firefox and the Java->Python translation should be simple.

Wrecker answered 2/1, 2019 at 10:36 Comment(1)
how to do this java>python transalation?Ganny
U
0

I was able to download it progamatically using Selenium ActionChains and Keys.

The problem that i had is that i could not deactivate the PDF plugin like the other solutions because it will trigger a Captcha.

So for those of you that dont want to deactivate the native PDF plugin, this code will navigate to the print icon in the Chrome PDF Viewer and hit it using the keyboard.

from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

#OPEN A NEW WINDOW
driver.switch_to.new_window('tab')
driver.get("Your PDF link")

#AT THE PDF PAGE
for i in range(9):
    ActionChains(driver).send_keys(Keys.TAB).perform()
    time.sleep(0.2)
ActionChains(driver).send_keys(Keys.ENTER).perform()

#RETURN TO THE ORIGINAL PAGE
driver.close()

You will also need those preferences into your driver to print automatically and skip the print interface.

chrome_prefs = {
    'printing.print_preview_sticky_settings.appState': '{"recentDestinations":[{"id":"Save as PDF","origin":"local"}],"selectedDestinationId":"Save as PDF","version":2}',
    'savefile.default_directory': f'{saveFolder}',  # Replace with your desired directory path 
 } 

options.add_experimental_option('prefs', chrome_prefs)
options.add_argument('--kiosk-printing')
Usual answered 9/6 at 6:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.