Selenium (Python) - waiting for a download process to complete using Chrome web driver
Asked Answered
A

12

22

I'm using selenium and python via chromewebdriver (windows) in order to automate a task of downloading large amount of files from different pages. My code works, but the solution is far from ideal: the function below clicks on the website button that initiating a java script function that generating a PDF file and then downloading it.

I had to use a static wait in order to wait for the download to be completed (ugly) I cannot check the file system in order to verify when the download is completed since i'm using multi threading (downloading lot's of files from different pages at once) and also the the name of the files is generated dynamically in the website itself.

My code:

def file_download(num, drivervar):
Counter += 1
    try:
        drivervar.get(url[num])
        download_button = WebDriverWait(drivervar, 20).until(EC.element_to_be_clickable((By.ID, 'download button ID')))
        download_button.click()
        time.sleep(10) 
    except TimeoutException: # Retry once
        print('Timeout in thread number: ' + str(num) + ', retrying...')
..... 

Is it possible to determine download completion in webdriver? I want to avoid using time.sleep(x).

Thanks a lot.

Arrest answered 15/1, 2018 at 12:45 Comment(3)
if nothing in the UI denotes completion, then there is no way to tell besides checking the file system. why can't you do that?Keepsake
I've never tried this before but can you set the download path to some specific value? e.g. create a timestamped folder per run and then point the download path to that folder? That way you will only get one file per folder and will be able to determine when the download is complete. My understanding is that you can't change the download path after you've instantiated the driver so keep that in mind if you try this approach. You can write another script that will do cleanup after the first script is done, e.g. gets all the files and puts them into a single folder and deletes all subfolders.Sofia
Possible duplicate of How to detect when all downloads finished with Selenium Webdriver and FirefoxBlunk
C
56

You can get the status of each download by visiting chrome://downloads/ with the driver.

To wait for all the downloads to finish and to list all the paths:

def every_downloads_chrome(driver):
    if not driver.current_url.startswith("chrome://downloads"):
        driver.get("chrome://downloads/")
    return driver.execute_script("""
        var items = document.querySelector('downloads-manager')
            .shadowRoot.getElementById('downloadsList').items;
        if (items.every(e => e.state === "COMPLETE"))
            return items.map(e => e.fileUrl || e.file_url);
        """)


# waits for all the files to be completed and returns the paths
paths = WebDriverWait(driver, 120, 1).until(every_downloads_chrome)
print(paths)

Was updated to support changes till version 81.

Convolve answered 15/1, 2018 at 17:22 Comment(7)
Great idea, I will try it out. Thanks!Arrest
Amazing thoughts (+1)Blunk
developers.chrome.com/extensions/downloads this is the api doc, the document says "filename" to get the downloaded file name, however, it's case-sensitive based on my test in python, should be "fileName" and "filePath".Calumet
above code was working well up to today, then failed. I had to change file_url to fileUrl to make it work again.Gasholder
Unfortunately this does not work in headless mode. You will get selenium.common.exceptions.JavascriptException: Message: javascript error: downloads is not definedErotic
Does not work anymore with Chrome 80, see my answer belowGasholder
paths returns a list of downloaded file paths. To get your downloaded file path, use paths[0].Nickname
E
13

I have had the same problem and found a solution. You can check weither or not a .crdownload is in your download folder. If there are 0 instances of a file with .crdownload extension in the download folder then all your downloads are completed. This only works for chrome and chromium i think.

def downloads_done():
    while True:
        for filename in os.listdir("/downloads"):
            if ".crdownload" in i:
                time.sleep(0.5)
                downloads_done()

Whenever you call downloads_done() it will loop itself untill all downloads are completed. If you are downloading massive files like 80 gigabytes then i don't recommend this because then the function can reach maximum recursion depth.

2020 edit:

def wait_for_downloads():
    print("Waiting for downloads", end="")
    while any([filename.endswith(".crdownload") for filename in 
               os.listdir("/downloads")]):
        time.sleep(2)
        print(".", end="")
    print("done!")

The "end" keyword argument in print() usually holds a newline but we replace it. While there are no filenames in the /downloads folder that end with .crdownload sleep for 2 seconds and print one dot without newline to console

I don't really recommend using selenium anymore after finding out about requests but if it's a very heavily guarded site with cloudflare and captchas etc then you might have to resort to selenium.

Eduard answered 28/7, 2018 at 21:50 Comment(2)
Great, answer, I had to modify it a bit however to include .tmp files - while any([filename.endswith(".crdownload") or filename.endswith(".tmp") for filename in os.listdir(default_download_directory)]):Callipygian
Add complete path of downloads i.e C:\\Users\\Dev\\Downloads\\ if you get The system cannot find the path specified:Synopsis
G
9

With Chrome 80, I had to change the answer from @florent-b by the code below:

def every_downloads_chrome(driver):
    if not driver.current_url.startswith("chrome://downloads"):
        driver.get("chrome://downloads/")
    return driver.execute_script("""
        return document.querySelector('downloads-manager')
        .shadowRoot.querySelector('#downloadsList')
        .items.filter(e => e.state === 'COMPLETE')
        .map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);
        """)

I believe this is retro-compatible, I mean this shall be working with older versions of Chrome.

2023-11-04 EDIT

I had to update the code above, from e.state === 'COMPLETE' to e.state == '2', to make the code works with Chromium 119.

Gasholder answered 13/3, 2020 at 21:18 Comment(2)
This woks fine with the latest chrome version but it failes when chrome runs on headless mode. Do you have any ideas for it?Wigwag
I've found the reason: MacOS can't be run on headless mode https://mcmap.net/q/122416/-download-file-through-google-chrome-in-headless-modeWigwag
L
5

There are issues with opening chrome://downloads/ when running Chrome in headless mode.

The following function uses a composite approach that works whether the mode is headless or not, choosing the better approach available in each mode.

It assumes that the caller clears all files downloaded at file_download_path after each call to this function.

import os
import logging
from selenium.webdriver.support.ui import WebDriverWait

def wait_for_downloads(driver, file_download_path, headless=False, num_files=1):
    max_delay = 60
    interval_delay = 0.5
    if headless:
        total_delay = 0
        done = False
        while not done and total_delay < max_delay:
            files = os.listdir(file_download_path)
            # Remove system files if present: Mac adds the .DS_Store file
            if '.DS_Store' in files:
                files.remove('.DS_Store')
            if len(files) == num_files and not [f for f in files if f.endswith('.crdownload')]:
                done = True
            else:
                total_delay += interval_delay
                time.sleep(interval_delay)
        if not done:
            logging.error("File(s) couldn't be downloaded")
    else:
        def all_downloads_completed(driver, num_files):
            return driver.execute_script("""
                var items = document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList').items;
                var i;
                var done = false;
                var count = 0;
                for (i = 0; i < items.length; i++) {
                    if (items[i].state === 'COMPLETE') {count++;}
                }
                if (count === %d) {done = true;}
                return done;
                """ % (num_files))

        driver.execute_script("window.open();")
        driver.switch_to_window(driver.window_handles[1])
        driver.get('chrome://downloads/')
        # Wait for downloads to complete
        WebDriverWait(driver, max_delay, interval_delay).until(lambda d: all_downloads_completed(d, num_files))
        # Clear all downloads from chrome://downloads/
        driver.execute_script("""
            document.querySelector('downloads-manager').shadowRoot
            .querySelector('#toolbar').shadowRoot
            .querySelector('#moreActionsMenu')
            .querySelector('button.clear-all').click()
            """)
        driver.close()
        driver.switch_to_window(driver.window_handles[0])
Lamarre answered 21/5, 2020 at 12:38 Comment(0)
E
1
import os
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait

class MySeleniumTests(unittest.TestCase):

    selenium = None

    @classmethod
    def setUpClass(cls):
        cls.selenium = webdriver.Firefox(...)

    ...

    def test_download(self):
        os.chdir(self.download_path) # default download directory

        # click the button
        self.selenium.get(...)
        self.selenium.find_element_by_xpath(...).click()

        # waiting server for finishing inner task
        def download_begin(driver):
            if len(os.listdir()) == 0:
                time.sleep(0.5)
                return False
            else:
                return True
        WebDriverWait(self.selenium, 120).until(download_begin) # the max wating time is 120s

        # waiting server for finishing sending.
        # if size of directory is changing,wait
        def download_complete(driver):
            sum_before=-1
            sum_after=sum([os.stat(file).st_size for file in os.listdir()])
            while sum_before != sum_after:
                time.sleep(0.2)
                sum_before = sum_after
                sum_after = sum([os.stat(file).st_size for file in os.listdir()])
            return True
        WebDriverWait(self.selenium, 120).until(download_complete)  # the max wating time is 120s

You must do these thing

  1. Wait for server to finish inner business( for example, query from database).
  2. Wait for server to finish sending the files.

(my English is not very well)

Equine answered 29/11, 2019 at 8:53 Comment(0)
X
1

To obtain the return of more than one item, I had to change the answer of @thdox by the code below:

def every_downloads_chrome(driver):
    if not driver.current_url.startswith("chrome://downloads"):
        driver.get("chrome://downloads/")
    return driver.execute_script("""
        var elements = document.querySelector('downloads-manager')
        .shadowRoot.querySelector('#downloadsList')
        .items
        if (elements.every(e => e.state === 'COMPLETE'))
        return elements.map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);
        """)
Xenia answered 15/9, 2020 at 14:14 Comment(0)
S
0

This may not work for all usecases but for my simple need to wait for one pdf to download it works great. Based off of Walter's comment above.

def get_non_temp_len(download_dir):
    non_temp_files = [i for i in os.listdir(download_dir) if not (i.endswith('.tmp') or i.endswith('.crdownload'))]
    return len(non_temp_files)

download_dir = 'your/download/dir'
original_count = get_non_temp_len(download_dir) # get the file count at the start

# do your selenium stuff 

while original_count == get_non_temp_len(download_dir):
    time.sleep(.5) # wait for file count to change
    
driver.quit()
Shifflett answered 24/6, 2020 at 16:11 Comment(0)
T
0

I had the same problem and this method worked for me.

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import ElementClickInterceptedException
from threading import Thread
import os
import datetime
def checkFilePresence(downloadPath, numberOfFilesInitially, artistName, 
    songTitle):
    timeNow = datetime.datetime.now()
    found = False
    while not found:
        numberOfFilesNow = len(os.listdir(downloadPath))
        if numberOfFilesNow > numberOfFilesInitially:
            for folders, subfolders, files in os.walk(downloadPath):
                for file in files:
                    modificationTime = datetime.datetime.fromtimestamp\
                    (os.path.getctime(os.path.join(folders, file)))
                    if modificationTime > timeNow:
                        if file.endswith('.mp3'):
                            return
Thermopylae answered 26/9, 2020 at 20:30 Comment(0)
W
0

This code work in headless mode and return downloaded file name (based on @protonum code):

def wait_for_downloads(download_path):
    max_delay = 30
    interval_delay = 0.5
    total_delay = 0
    file = ''
    done = False
    while not done and total_delay < max_delay:
        files = [f for f in os.listdir(download_path) if f.endswith('.crdownload')]
        if not files and len(file) > 1:
            done = True
        if files:
            file = files[0]
        time.sleep(interval_delay)
        total_delay += interval_delay
    if not done:
        logging.error("File(s) couldn't be downloaded")
    return download_path + '/' + file.replace(".crdownload", "")
Wernick answered 9/5, 2021 at 23:14 Comment(0)
R
0
def wait_for_download_to_be_don(self, path_to_folder, file_name):
    max_time = 60
    counter = 0
    while not os.path.exists(path_to_folder + file_name) and time_counter < max_time:
        sleep(0.5)
        time_counter += 0.5
        if time_counter == max_time:
            assert os.path.exists(path_to_folder + file_name), "The file wasn't downloaded"
Robrobaina answered 15/6, 2021 at 11:20 Comment(0)
Y
0

Chrome 120+

Download state has to be treated as an int:

 var elements = document.querySelector('downloads-manager')
        .shadowRoot.querySelector('#downloadsList')
        .items
        if (elements.every(e => e.state === 2))
        return elements.map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);

Full code:

def every_downloads_chrome(driver):
    if not driver.current_url.startswith("chrome://downloads"):
        driver.get("chrome://downloads/")

    return driver.execute_script("""
        return document.querySelector('downloads-manager')
        .shadowRoot.querySelector('#downloadsList')
        .items.filter(e => e.state === 2)
        .map(e => e.filePath || e.file_path || e.fileUrl || e.file_url);
        """)
Yolande answered 20/7, 2024 at 11:42 Comment(0)
E
-2

When using test automation, its crucial that developers make the software testable. It is your job to check the software combined with the testability, meaning that you need to request a spinner or a simple HTML tag which indicates when the download is done successfully.

In a case as yours, where you cannot check it in the UI and you cannot check in system, this is the best way to solve it.

Escudo answered 15/1, 2018 at 15:54 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.