Asynchronously get and store images in python

Asked 22/8, 2013 at 10:8 Answered 22/8, 2013 at 10:23

The following code is a sample of non-asynchronous code, is there any way to get the images asynchronously?

import urllib
for x in range(0,10):
        urllib.urlretrieve("http://test.com/file %s.png" % (x), "temp/file %s.png" % (x))

I have also seen the Grequests library but I couldn't figure much if that is possible or how to do it from the documentation.

Suh answered 22/8, 2013 at 10:8 Comment(0)

You don't need any third party library. Just create a thread for every request, start the threads, and then wait for all of them to finish in the background, or continue your application while the images are being downloaded.

import threading

results = []
def getter(url, dest):
   results.append(urllib.urlretreave(url, dest))

threads = []
for x in range(0,10):
    t = threading.Thread(target=getter, args=('http://test.com/file %s.png' % x,
                                              'temp/file %s.png' % x))
    t.start()
    threads.append(t)
# wait for all threads to finish
# You can continue doing whatever you want and
# join the threads when you finally need the results.
# They will fatch your urls in the background without
# blocking your main application.
map(lambda t: t.join(), threads)

Optionally you can create a thread pool that will get urls and dests from a queue.

If you're using Python 3 it's already implemented for you in the futures module.

Underwrite answered 22/8, 2013 at 10:22 Comment(4)

Superb. I don't know how I was living w/o multithreading so far. Thanks – Suh 22/8, 2013 at 11:32

Awesomely simple and useful answer! The use of map was superb (haven't used it before but I'm learning it now) – Urge 25/3, 2016 at 13:53

Note that this use of map does not work anymore in python 3 as a map is not being evaluated directly anymore (only when the actual data is needed). You can use [x.join() for x in threads] as a one-liner, or write out the two-line for-loop for x in threads: x.join(). – Dilks 3/3, 2019 at 14:15

Hi, suppose I have 100k image urls. And i am using thread pooling in python 3.12. Will the program start a 100k threads? – Hyacinthia 8/1 at 20:56

Something like this should help you

import grequests
urls = ['url1', 'url2', ....] # this should be the list of urls

    requests = (grequests.get(u) for u in urls)
    responses = grequests.map(requests)
    for response in responses:
        if 199 < response.status_code < 400:
             name = generate_file_name()    # generate some name for your image file with extension like example.jpg
             with open(name, 'wb') as f:    # or save to S3 or something like that
                  f.write(response.content)

Here only the downloading of images would be parallel but writing each image content to a file would be sequential so you can create a thread or do something else to make it parallel or asynchronous

Xylograph answered 22/8, 2013 at 10:23 Comment(0)

Recommended topics

Hot tags