How can I download a file on a click event using selenium?

Asked 26/8, 2013 at 8:32 Answered 10/6, 2023 at 20:53

Solved python selenium selenium-webdriver web-scraping

I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys

browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")

browser.close()

I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?

Saturant answered 26/8, 2013 at 8:32 Comment(5)

I reccomend using urllib and use urllib.urlretrieve(url) to get the download where url is the url that the link sends you to – Mablemabry 26/8, 2013 at 8:35

no because it works only with click event. – Saturant 26/8, 2013 at 8:37

but if you parse the HTML of the page you can get the link that the click event sends to the browser and use that – Mablemabry 26/8, 2013 at 8:38

ohh never mind now looking at the page youre right my bad – Mablemabry 26/8, 2013 at 8:40

Possible duplicate of How to download any file and save it to the desired location using Selenium Webdriver – Waynant 16/10, 2017 at 15:45

Find the link using find_element(s)_by_*, then call click method.

from selenium import webdriver

# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')

browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")

browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()

Added profile manipulation code to prevent download dialog.

Subinfeudate answered 26/8, 2013 at 9:6 Comment(17)

what should be done if i wanted to hide the browser or keep the browser in hide/minimized mode while processing? – Saturant 26/8, 2013 at 9:22

@sam, Search for headless + selenium + firefox. – Subinfeudate 26/8, 2013 at 9:25

@sam, Or phanromjs, ghostdriver. – Subinfeudate 26/8, 2013 at 9:26

@sam, Sorry, I don't know how to do that. See close firefox automatically when download is complete. – Subinfeudate 26/8, 2013 at 9:34

@falsetru, this solution works. I would have to research on how to redirect it to download to another place & not the tmp folder on Ubnutu – Primp 15/4, 2014 at 17:15

@Saturant You may use PyVirtualDisplay for running firefox headless. It worked for me. – Bodyguard 27/10, 2015 at 7:16

Still getting download dialogue box. – Bodyguard 27/10, 2015 at 7:39

Hi, I trying to do the same thing (and it's working) but I'm wondering if anyone knows how to control the download location. It's automatically downloading in my Downloads folder, but I'd like to route it to the folder where my .py file is located (so that I can then import it directly with the script). Thanks! – Airborne 21/7, 2016 at 17:6

Nevermind, I found the answer. Here it is, in case anyone else needs it: #25252083 – Airborne 21/7, 2016 at 17:13

Is there a way to set (or at least get) the path including the filename of the downloaded element? When the download is triggered by JavaScript / onClick I don't think there is a trivial way to get the name by inspecting the source. – Knudson 17/8, 2017 at 14:18

@MartinThoma, How about create a temporary directory, and os.listdir('/path/to/the/directory') (+ optionally sort by ctime/mtime) ? – Subinfeudate 17/8, 2017 at 15:41

@Subinfeudate Nice idea. I've just realized that I'm using chromium and that it might be completely different there. – Knudson 17/8, 2017 at 15:47

Selenium is really making this hard to do. I'm still getting the download dialog box :( – Samara 26/5, 2019 at 9:7

The browser.download.dir setting no longer seems to be respected. Selenium/Firefox just dumps all downloads straight into your home directory regardless of this setting. – Picnic 12/8, 2020 at 5:24

@Cerin, According to this answer, should set browser.download.folderList to 2. – Subinfeudate 12/8, 2020 at 7:26

@Subinfeudate I found the issue was that I was using a relative path. It works as described if you use an absolute path. – Picnic 12/8, 2020 at 14:57

I'm still getting dialog box and I found solution here: reddit.com/r/learnpython/comments/42hg1c/comment/czaiain/… – Bastia 18/1, 2022 at 12:27

I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.

Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...

Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay

The Magic

import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json

root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'

print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')

print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')

print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')

print('Injecting retrieval code into web page')
driver.execute_script("""
    window.file_contents = null;
    var xhr = new XMLHttpRequest();
    xhr.responseType = 'blob';
    xhr.onload = function() {
        var reader  = new FileReader();
        reader.onloadend = function() {
            window.file_contents = reader.result;
        };
        reader.readAsDataURL(xhr.response);
    };
    xhr.open('GET', %(download_url)s);
    xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
    'download_url': json.dumps(download_url),
})

print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
    # Returns the file retrieved base64 encoded (perfect for downloading binary)
    downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
    print(downloaded_file)
    if not downloaded_file:
        print('\tNot downloaded, waiting...')
        time.sleep(0.5)
print('\tDone')

print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.

Explaination

We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.

Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.

Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.

This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.

Alternate Approach

While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.

Helve answered 14/4, 2016 at 3:50 Comment(1)

I have a download button for a PDF that is behind a captcha, so is tied to the session. The download_url I have is not to a .pdf file, but to a javascript page with a $(document).ready(function () { which calls a $.post() to the actual PDF. When I use your solution I end up downloading a HTML file rather than the PDF I want to download. How would I adapt this in this situation? – Hukill 4/6, 2020 at 0:13

In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:

docs = document
  .querySelector('downloads-manager')
  .shadowRoot.querySelector('#downloads-list')
  .getElementsByTagName('downloads-item')

This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)

Deckert answered 18/5, 2017 at 22:15 Comment(1)

Please note the question tag. It's a python question, not JS! – Tertullian 7/4, 2021 at 14:35

Here is the full working code. You can use web scraping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.

from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions() 
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
    driver.get('xxxx') # Your Website Address
    password_box = driver.find_element_by_name('password')
    password_box.send_keys('xxxx') #Password
    download_button = driver.find_element_by_class_name('link_w_pass')
    download_button.click()
    driver.quit()
except:
    driver.quit()
    print("Faulty URL")

Principle answered 8/9, 2019 at 11:5 Comment(0)

Simplest method for Chrome is to use the add_experimental_option especially if using a remote webdriver:

chrome_options = webdriver.ChromeOptions()
preferences = {"download.default_directory" : "/some/path"}
#be sure to add preferences as an experimental option
chrome_options.add_experimental_option("prefs", preferences) 

driver = webdriver.Remote(
  command_executor="http://localhost:4444",
  options=chrome_options
)

Earleanearleen answered 10/6, 2023 at 20:53 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags