Naming a file when downloading with Selenium Webdriver
Asked Answered
K

3

14

I see that you can set where to download a file to through Webdriver, as follows:

fp = webdriver.FirefoxProfile()

fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")

browser = webdriver.Firefox(firefox_profile=fp)

But, I was wondering if there is a similar way to give the file a name when it is downloaded? Preferably, probably not something that is associated with the profile, as I will be downloading ~6000 files through one browser instance, and do not want to have to reinitiate the driver for each download.

Kerf answered 9/11, 2012 at 23:20 Comment(0)
A
3

I do not know if there is a pure Selenium handler for this, but here is what I have done when I needed to do something with the downloaded file.

  1. Set a loop that polls your download directory for the latest file that does not have a .part extension (this indicates a partial download and would occasionally trip things up if not accounted for. Put a timer on this to ensure that you don't go into an infinite loop in the case of timeout/other error that causes the download not to complete. I used the output of the ls -t <dirname> command in Linux (my old code uses commands, which is deprecated so I won't show it here :) ) and got the first file by using

    # result = output of ls -t
    result = result.split('\n')[1].split(' ')[-1]
    
  2. If the while loop exits successfully, the topmost file in the directory will be your file, which you can then modify using os.rename (or anything else you like).

Probably not the answer you were looking for, but hopefully it points you in the right direction.

Attentive answered 9/11, 2012 at 23:38 Comment(4)
thanks, I think I will go with something like this. Save the preferred file names to a file, then list all the files with their date created after I download all of them, and rename at that time.Kerf
@Kerf You can actually change at the time of download using that method if it works better. Happy to helpAttentive
+1, this is what I did to overcome this problem, is essentially constantly poll the download directory.Kendy
@Kendy Ha, nice to hear somebody else does it that way too :)Attentive
N
4

I would suggest a little bit strange way: do not download files with the use of Selenium if possible.

I mean get the file URL and use urllib library to download the file and save it to disk in a 'manual' way. The issue is that selenium doesn't have a tool to handle Windows dialogs, such as 'save as' dialog. I'm not sure, but I doubt that it can handle any OS dialogs at all, please correct me I'm wrong. :)

Here's a tiny example:

import urllib
urllib.urlretrieve( "http://www.yourhost.com/yourfile.ext", "your-file-name.ext")

The only job for us here is to make sure that we handle all the urllib Exceptions. Please see http://docs.python.org/2/library/urllib.html#urllib.urlretrieve for more info.

Natika answered 10/11, 2012 at 8:33 Comment(5)
Thanks for the suggestion, but I need to use selenium (or something similar) as I have to navigate through a few non restful links first.Kerf
I realize this, you can easily use urllib functions in your testcase, can't you?Natika
I'm not sure, could you give me an example? What I'm trying to do is download ~6000 files (logging in up front, navigating a couple pages, then 6000 times clicking a download and next button). So would I grab the url of the download link and do it with urllib? I don't think it would work because I have to be logged in, and again, the links are non restful, so I think it'd need to have the session variables also. But, I'll try it! And if you have any other suggestions, let me know.Kerf
Well, you can get cookies by calling that: cookies = driver.get_cookies() You won't be able to use urllib, but urllib2 openers could help (urllib doesn't know how to send cookie headers, and urllib2 does): opener = urllib2.build_opener( urllib2.HTTPCookieProcessor(cookielib.CookieJar()) ) opener.addheaders.append( ('Cookies', 'key=val, key2=val2') ) Also, you can build a Request object and use Request.add_header to add raw header to your HTTP request (please check cookie header format): Request.add_header('Cookies', 'key=val,key2=val2') Please tell if something unclear.Natika
why not use selenium to get to the end link, then get the link href using selenium and pass it to urllib to actually retrieve the file?Beak
A
3

I do not know if there is a pure Selenium handler for this, but here is what I have done when I needed to do something with the downloaded file.

  1. Set a loop that polls your download directory for the latest file that does not have a .part extension (this indicates a partial download and would occasionally trip things up if not accounted for. Put a timer on this to ensure that you don't go into an infinite loop in the case of timeout/other error that causes the download not to complete. I used the output of the ls -t <dirname> command in Linux (my old code uses commands, which is deprecated so I won't show it here :) ) and got the first file by using

    # result = output of ls -t
    result = result.split('\n')[1].split(' ')[-1]
    
  2. If the while loop exits successfully, the topmost file in the directory will be your file, which you can then modify using os.rename (or anything else you like).

Probably not the answer you were looking for, but hopefully it points you in the right direction.

Attentive answered 9/11, 2012 at 23:38 Comment(4)
thanks, I think I will go with something like this. Save the preferred file names to a file, then list all the files with their date created after I download all of them, and rename at that time.Kerf
@Kerf You can actually change at the time of download using that method if it works better. Happy to helpAttentive
+1, this is what I did to overcome this problem, is essentially constantly poll the download directory.Kendy
@Kendy Ha, nice to hear somebody else does it that way too :)Attentive
B
1

Solution with code as suggested by the selected answer. Rename the file after each one is downloaded.

import os

os.chdir(SAVE_TO_DIRECTORY)
files = filter(os.path.isfile, os.listdir(SAVE_TO_DIRECTORY))
files = [os.path.join(SAVE_TO_DIRECTORY, f) for f in files]  # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))
newest_file = files[-1]
os.rename(newest_file, docName + ".pdf")

This answer was posted as an edit to the question naming a file when downloading with Selenium Webdriver by the OP user1253952 under CC BY-SA 3.0.

Barricade answered 22/12, 2022 at 13:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.