Download Images from list of urls
Asked Answered
B

8

31

I have a list of urls in a text file.i want the images to be downloaded to a particular folder ,how i can do it.is there any addons available in chrome or any other program to download images from url

Barnardo answered 18/3, 2017 at 18:23 Comment(0)
C
90
  • Create a folder in your machine.

  • Place your text file of images URL in the folder.

  • cd to that folder.
  • Use wget -i images.txt

  • You will find all your downloaded files in the folder.

Clothespin answered 11/7, 2017 at 11:46 Comment(3)
Perfect ! one more advantage of using LinuxBakeman
Beware that in Windows when launching wget from Powershell, wget will be aliased by an internal Windows command which will behave slightly differently. If you need to use the original wget just open a normal cmd.exe shell and launch it from there.Truitt
I had to brew install wget first, but after that, this was a breeze! Thanks so much!Stellate
P
9

On Windows 10/11 this is fairly trivial using

for /F "eol=;" %f in (filelist.txt) do curl -O %f

Note the inclusion of eol=; allows us to mask individual exclusions by adding ; at the start of those lines in filelist.txt that we do not want this time. If using above in a batch file GetFileList.cmd then double those %%'s

So in my system I simply type and enter Do GetFileList and all those stored URLs are downloaded, where Do is an old DO'S trick to keep many small commands in one self editing zip file but nowadays I use CMD where Do Edit calls it up as Notepad "%~f0" to paste a section like this.

Part of Do.bat

:GetFileList
Rem as posted to https://stackoverflow.com/questions/42878196
for /F "eol=;" %%f in (filelist.txt) do curl -O %%f
exit /b 0
goto error GetFileList

Windows 7 has a FTP command, but that can often throw up a firewall dialog requiring a User Authorization response.

Currently running Windows 7 and wanting to download a list of URLs without downloading any wget.exe or other dependency like curl.exe (which would be simplest as the first command) the shortest compatible way is a power-shell command (not my favorite for speed, but if needs must.)

The file with URLs is filelist.txt and IWR is the PS near equivalent of wget.

The Security Protocol first command ensures we are using modern TLS1.2 protocol

-OutF ... split-path ... means the filenames will be the same as remote filenames but in CWD (current working directory), for scripting you can cd /d folder if necessary.

PS> [Net.ServicePointManager]::SecurityProtocol = "Tls12" ; GC filelist.txt | % {IWR $_ -OutF $(Split-Path $_ -Leaf)}

To run as a CMD use a slightly different set of quotes around 'Tls12'

PowerShell -C "& {[Net.ServicePointManager]::SecurityProtocol = 'Tls12' ; GC filelist.txt | % {IWR $_ -OutF $(Split-Path $_ -Leaf)}}"
Pertinent answered 2/1, 2022 at 22:22 Comment(0)
A
3

This needs to be made into a function with error handling but it repeatedly downloads images for image classification projects

    import requests

    urls = pd.read_csv('cat_urls.csv') #save the url list as a dataframe

    rows = []

    for index, i in urls.iterrows():
        rows.append(i[-1])

    counter = 0

    for i in rows:
    

    file_name = 'cat' + str(counter) + '.jpg'
    
        print(file_name)
        response = requests.get(i)
        file = open(file_name, "wb")
        file.write(response.content)
        file.close()
        counter += 1
Aubade answered 3/8, 2020 at 2:38 Comment(0)
O
0
import os
import time
import sys
import urllib
from progressbar import ProgressBar

def get_raw_html(url):
    version = (3,0)
    curr_version = sys.version_info
    if curr_version >= version:     #If the Current Version of Python is 3.0 or above
        import urllib.request    #urllib library for Extracting web pages
        try:
            headers = {}
            headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
            request = urllib.request.Request(url, headers = headers)
            resp = urllib.request.urlopen(request)
            respData = str(resp.read())
            return respData
        except Exception as e:
            print(str(e))
    else:                        #If the Current Version of Python is 2.x
        import urllib2
        try:
            headers = {}
            headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
            request = urllib2.Request(url, headers = headers)
            try:
                response = urllib2.urlopen(request)
            except URLError: # Handling SSL certificate failed
                context = ssl._create_unverified_context()
                response = urlopen(req,context=context)
            #response = urllib2.urlopen(req)
            raw_html = response.read()
            return raw_html    
        except:
            return"Page Not found"


def next_link(s):
    start_line = s.find('rg_di')
    if start_line == -1:    #If no links are found then give an error!
        end_quote = 0
        link = "no_links"
        return link, end_quote
    else:
        start_line = s.find('"class="rg_meta"')
        start_content = s.find('"ou"',start_line+1)
        end_content = s.find(',"ow"',start_content+1)
        content_raw = str(s[start_content+6:end_content-1])
        return content_raw, end_content


def all_links(page):
    links = []
    while True:
        link, end_content = next_link(page)
        if link == "no_links":
            break
        else:
            links.append(link)      #Append all the links in the list named 'Links'
            #time.sleep(0.1)        #Timer could be used to slow down the request for image downloads
            page = page[end_content:]
    return links

def download_images(links, search_keyword):

    choice = input("Do you want to save the links? [y]/[n]: ")
    if choice=='y' or choice=='Y':
        #write all the links into a test file. 
        f = open('links.txt', 'a')        #Open the text file called links.txt
        for link in links:
            f.write(str(link))
            f.write("\n")
        f.close()   #Close the file 
    num = input("Enter number of images to download (max 100): ")
    counter = 1
    errors=0
    search_keyword = search_keyword.replace("%20","_")
    directory = search_keyword+'/'
    if not os.path.isdir(directory):
        os.makedirs(directory)
    pbar = ProgressBar()
    for link in pbar(links):
        if counter<=int(num):
            file_extension = link.split(".")[-1]
            filename = directory + str(counter) + "."+ file_extension
            #print ("Downloading image: " + str(counter)+'/'+str(num))
            try:
                urllib.request.urlretrieve(link, filename)
            except IOError:
                errors+=1
                #print ("\nIOError on Image" + str(counter))
            except urllib.error.HTTPError as e:
                errors+=1
                #print ("\nHTTPError on Image"+ str(counter))
            except urllib.error.URLError as e:
                errors+=1
                #print ("\nURLError on Image" + str(counter))

        counter+=1
    return errors


def search():

    version = (3,0)
    curr_version = sys.version_info
    if curr_version >= version:     #If the Current Version of Python is 3.0 or above
        import urllib.request    #urllib library for Extracting web pages
    else:
        import urllib2 #If current version of python is 2.x

    search_keyword = input("Enter the search query: ")

    #Download Image Links
    links = []
    search_keyword = search_keyword.replace(" ","%20")
    url = 'https://www.google.com/search?q=' + search_keyword+ '&espv=2&biw=1366&bih=667&site=webhp&source=lnms&tbm=isch&sa=X&ei=XosDVaCXD8TasATItgE&ved=0CAcQ_AUoAg'
    raw_html =  (get_raw_html(url))
    links = links + (all_links(raw_html))
    print ("Total Image Links = "+str(len(links)))
    print ("\n")
    errors = download_images(links, search_keyword)
    print ("Download Complete.\n"+ str(errors) +" errors while downloading.")

search()
Olds answered 23/7, 2019 at 19:16 Comment(1)
Hi, welcome to stack over flow! Usually to make it more accessible you can add some text to describe and explained what the code is doing ;)Gowrie
S
0

In this python project I make a search in unsplash.com, which brings me a list of URL, then I save a number of them (pre-defined by the user) to a pre-defined folder. Check it out.

Semipermeable answered 20/8, 2020 at 1:5 Comment(0)
G
0

On Windows, install wget - https://sourceforge.net/projects/gnuwin32/files/wget/1.11.4-1/

and add C:\Program Files (x86)\GnuWin32\bin to your environment path.

create a folder with a txt file of all the images you want to download.

in the location bar at the top of the file explorer type cmd

When the command prompt opens enter the following.

wget -i images.txt --no-check-certificate

Glyptography answered 2/1, 2022 at 18:12 Comment(0)
C
0

If you want a solution in windows.

Prepare Your Text File: Create a text file containing the URLs of the images, with one URL per line with name file.txt.

Open PowerShell: Press Win + X, then select "Windows PowerShell" or "Windows PowerShell (Admin)" from the menu. Alternatively, you can search for "PowerShell" in the Start menu and open it from there.

Navigate to the Directory: Use the cd command to navigate to the directory where your text file is located. For example:

$urls = Get-Content "file.txt"
$outputDirectory = "output"
New-Item -ItemType Directory -Force -Path $outputDirectory | Out-Null

foreach ($url in $urls) {
    $uri = [System.Uri]$url
    $filename = [System.IO.Path]::GetFileNameWithoutExtension($uri.Segments[-1])
    $extension = [System.IO.Path]::GetExtension($uri.Segments[-1])
    $parameters = $uri.Query -replace '\?', '_' -replace '=', '-' -replace '&', '_'
    $outputFilename = $filename + $parameters + $extension
    $outputPath = Join-Path -Path $outputDirectory -ChildPath $outputFilename
    Invoke-WebRequest -Uri $url -OutFile $outputPath
}
Curare answered 5/4 at 17:21 Comment(0)
U
0

If you don't want to download software like wget, you could take advantage of Chrome's built-in Save as Webpage, Complete option:

  1. Create an HTML file images.html with all of your images:

    <img src="https://example.com/image1.jpg" />
    <img src="https://example.com/image2.jpg" />
    ...
    
  2. Open the HTML file in Chrome.

  3. File > Save Page As…

  4. Choose Webpage, Complete as the Format.

  5. Navigate to where you saved the files and open the corresponding _files folder. It should contain all the images from the webpage.

Unkenned answered 9/5 at 16:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.