Python - Getting all images from an html file

About

Asked 28/11, 2010 at 3:16 Answered 28/11, 2010 at 4:34

Solved python image urllib

Can someone help me parse a html file to get the links for all the images in the file in python?

Preferably with out a 3rd party module...

Thanks!

Respectable answered 28/11, 2010 at 3:16 Comment(0)

You can use Beautiful Soup. I know you said without a 3rd party module. However, this is an ideal tool for parsing HTML.

import urllib2
from BeautifulSoup import BeautifulSoup
page = BeautifulSoup(urllib2.urlopen("http://www.url.com"))
page.findAll('img')

Ptosis answered 28/11, 2010 at 3:21 Comment(2)

OK. Seems like this will help it out alot so I'll check it out. Thanks! – Respectable 28/11, 2010 at 3:35

I think Russell missed BeautifulSoup(page) – Breaststroke 5/7, 2011 at 21:32

only using PSL

from html.parser import HTMLParser
class MyParse(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag=="img":
            print(dict(attrs)["src"])

h=MyParse()
page=open("index.html").read()
h.feed(page)

Hinson answered 28/11, 2010 at 3:38 Comment(2)

You can augment this with urllib to open a web page and download the images. – Gaulin 28/11, 2010 at 3:43

For me this only works with "from HTMLParser import HTMLParser" – Respectability 6/3, 2014 at 15:17

It's generally accepted that lxml is faster than Beautiful Soup (ref). Its tutorial can be found here: (link) You may also take a look at this old stackoverflow post.

Pavis answered 28/11, 2010 at 4:34 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags