How can I take a screenshot/image of a website using Python?

Asked 28/7, 2009 at 22:48 Answered 17/9, 2020 at 0:30

What I want to achieve is to get a website screenshot from any website in python.

Env: Linux

Age answered 28/7, 2009 at 22:48 Comment(6)

A quick search of the site brings up many, many near-duplicates of this. Here's a good start: #714438 – Figurant 28/7, 2009 at 22:55

Shog9: Thanks!! your link has some... will check it. – Age 28/7, 2009 at 23:22

Shog9: why don't you add it as an answer? so it can give you points. – Age 28/7, 2009 at 23:27

@Esteban: it's not my work - someone else took the time to dig into this and find the resources; i'm just posting links. :-) – Figurant 29/7, 2009 at 3:29

I would suggest leaning towards phantomjs now as per the explanation here as it provides a very clean and robust solution: #9390993 – Finespun 22/2, 2012 at 19:16

@Figurant The answer referenced in your first comment has been removed because of "moderation." Thanks! – Gasholder 28/9, 2015 at 18:6

On the Mac, there's webkit2png and on Linux+KDE, you can use khtml2png. I've tried the former and it works quite well, and heard of the latter being put to use.

I recently came across QtWebKit which claims to be cross platform (Qt rolled WebKit into their library, I guess). But I've never tried it, so I can't tell you much more.

The QtWebKit links shows how to access from Python. You should be able to at least use subprocess to do the same with the others.

Gratulation answered 29/7, 2009 at 1:12 Comment(1)

khtml2png is outdated according to the website, python-webkit2png is recommended by them. – Cuneo 10/6, 2017 at 14:32

Here is a simple solution using webkit: http://webscraping.com/blog/Webpage-screenshots-with-webkit/

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class Screenshot(QWebView):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

    def capture(self, url, output_file):
        self.load(QUrl(url))
        self.wait_load()
        # set to webpage size
        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())
        # render image
        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        print 'saving', output_file
        image.save(output_file)

    def wait_load(self, delay=0):
        # process app events until page loaded
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

    def _loadFinished(self, result):
        self._loaded = True

s = Screenshot()
s.capture('http://webscraping.com', 'website.png')
s.capture('http://webscraping.com/blog', 'blog.png')

Company answered 20/8, 2012 at 1:21 Comment(15)

Works well, thank you. However, works reliably only if run from the command line. In a django project, one would use subprocess.Popen() – Cypher 21/11, 2012 at 19:52

works fine from within a python web framework. However takes some effort to get webkit working headless. – Company 22/11, 2012 at 22:46

did anyone experience problems using @Company ´s method? It does not work on every webpage... – Ushas 19/9, 2014 at 14:20

what kind of webpage did it fail for? I would expect it to fail if the webpage loaded the content with AJAX or relied on a plugin. – Company 20/9, 2014 at 21:7

Well maybe it works, but installing Webkit is real rocket science especially if you have to do it on multiple systems, therefore I prefer nodejs approach proposed by Aamir Adnan. – Grandfatherly 5/12, 2014 at 8:2

apt-get install python-qt4 – Company 6/12, 2014 at 9:7

i am runing this code in a loop but it's working fine first time and end of the code program is terminated... and get the message Segmentation fault (core dumped) can you please help me to run this code in loop – Straightout 29/12, 2014 at 7:49

I just tried to use this method on Earth :: Global weather map and it just gives a black image, so its doesnt work well for all web pages. I'm guessing this has something to do with the animation being run on that site? – Nombril 2/1, 2016 at 12:26

I tried @Company code, but instead of one single url I am passing a list, i mean, inside a for loop... for url in urllist: s.capture(url, filename) but somewhere in the middle all images starts being equal... dispite url is not the same... is there a bug? – Ascription 17/2, 2016 at 16:40

For a list try this example: webscraping.com/blog/… – Company 19/2, 2016 at 9:51

@hoju, I tried your solution, it works. But the webpage width is narrow down as open in the mobile. I use: self.page().setViewportSize(QSize(width, height)) to resize the page. But may I know is there a way to auto-get the width and height as the original webpage? – Colleague 25/3, 2016 at 12:10

Hey @Company I am using Windows and python version 3.5. I have a python script which I use to scrape data from multiple urls at a single run of the script. How can I take the screenshot of those URLs? Please help – Huffman 1/6, 2016 at 11:9

@Company I meant the webpages pertaining to those URLs – Huffman 1/6, 2016 at 11:15

@Company , Can you please update the code accordingly to PyQt5 ? – Housewares 10/6, 2020 at 10:15

For anyone trying on mac, pyqt4 is not supported on macOS Sierra and above according to this thread. – Aged 26/7, 2020 at 10:47

Here is my solution by grabbing help from various sources. It takes full web page screen capture and it crops it (optional) and generates thumbnail from the cropped image also. Following are the requirements:

Requirements:

Install NodeJS
Using Node's package manager install phantomjs: npm -g install phantomjs
Install selenium (in your virtualenv, if you are using that)
Install imageMagick
Add phantomjs to system path (on windows)

import os
from subprocess import Popen, PIPE
from selenium import webdriver

abspath = lambda *p: os.path.abspath(os.path.join(*p))
ROOT = abspath(os.path.dirname(__file__))


def execute_command(command):
    result = Popen(command, shell=True, stdout=PIPE).stdout.read()
    if len(result) > 0 and not result.isspace():
        raise Exception(result)


def do_screen_capturing(url, screen_path, width, height):
    print "Capturing screen.."
    driver = webdriver.PhantomJS()
    # it save service log file in same directory
    # if you want to have log file stored else where
    # initialize the webdriver.PhantomJS() as
    # driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log')
    driver.set_script_timeout(30)
    if width and height:
        driver.set_window_size(width, height)
    driver.get(url)
    driver.save_screenshot(screen_path)


def do_crop(params):
    print "Croping captured image.."
    command = [
        'convert',
        params['screen_path'],
        '-crop', '%sx%s+0+0' % (params['width'], params['height']),
        params['crop_path']
    ]
    execute_command(' '.join(command))


def do_thumbnail(params):
    print "Generating thumbnail from croped captured image.."
    command = [
        'convert',
        params['crop_path'],
        '-filter', 'Lanczos',
        '-thumbnail', '%sx%s' % (params['width'], params['height']),
        params['thumbnail_path']
    ]
    execute_command(' '.join(command))


def get_screen_shot(**kwargs):
    url = kwargs['url']
    width = int(kwargs.get('width', 1024)) # screen width to capture
    height = int(kwargs.get('height', 768)) # screen height to capture
    filename = kwargs.get('filename', 'screen.png') # file name e.g. screen.png
    path = kwargs.get('path', ROOT) # directory path to store screen

    crop = kwargs.get('crop', False) # crop the captured screen
    crop_width = int(kwargs.get('crop_width', width)) # the width of crop screen
    crop_height = int(kwargs.get('crop_height', height)) # the height of crop screen
    crop_replace = kwargs.get('crop_replace', False) # does crop image replace original screen capture?

    thumbnail = kwargs.get('thumbnail', False) # generate thumbnail from screen, requires crop=True
    thumbnail_width = int(kwargs.get('thumbnail_width', width)) # the width of thumbnail
    thumbnail_height = int(kwargs.get('thumbnail_height', height)) # the height of thumbnail
    thumbnail_replace = kwargs.get('thumbnail_replace', False) # does thumbnail image replace crop image?

    screen_path = abspath(path, filename)
    crop_path = thumbnail_path = screen_path

    if thumbnail and not crop:
        raise Exception, 'Thumnail generation requires crop image, set crop=True'

    do_screen_capturing(url, screen_path, width, height)

    if crop:
        if not crop_replace:
            crop_path = abspath(path, 'crop_'+filename)
        params = {
            'width': crop_width, 'height': crop_height,
            'crop_path': crop_path, 'screen_path': screen_path}
        do_crop(params)

        if thumbnail:
            if not thumbnail_replace:
                thumbnail_path = abspath(path, 'thumbnail_'+filename)
            params = {
                'width': thumbnail_width, 'height': thumbnail_height,
                'thumbnail_path': thumbnail_path, 'crop_path': crop_path}
            do_thumbnail(params)
    return screen_path, crop_path, thumbnail_path


if __name__ == '__main__':
    '''
        Requirements:
        Install NodeJS
        Using Node's package manager install phantomjs: npm -g install phantomjs
        install selenium (in your virtualenv, if you are using that)
        install imageMagick
        add phantomjs to system path (on windows)
    '''

    url = 'https://mcmap.net/q/239414/-how-can-i-take-a-screenshot-image-of-a-website-using-python'
    screen_path, crop_path, thumbnail_path = get_screen_shot(
        url=url, filename='sof.png',
        crop=True, crop_replace=False,
        thumbnail=True, thumbnail_replace=False,
        thumbnail_width=200, thumbnail_height=150,
    )

These are the generated images:

Falla answered 5/8, 2013 at 21:30 Comment(6)

Works perfectly in my Django view. No need to set default user-agent, only screen resolution. – Paramedic 9/7, 2014 at 16:42

What if a webpage requires certificates for access ?? – Outpatient 9/6, 2015 at 11:33

Question was for Python, not NodeJS. – Denyse 20/10, 2016 at 14:17

answer is for Python, not NodeJS, this is how a plethora of companies are doing Virtual test users with Python running things (he could install PhantomJS without Node, but it's far easier to have npm available, especially if you'll be deploying it to a remote system) – Oralla 30/5, 2018 at 10:30

This was a great answer, but PhantomJS is discontinued. You can replace "webdriver.PhantomJS()" – Reducer 10/4, 2020 at 14:5

This was a great answer, but PhantomJS is discontinued and the call can be replaced by driver = webdriver.Chrome() which requires the installation of chromedriver. Since this will not be headless, it also makes for a slower experience with stuff flashing on screen, but it works. It makes the answer very similar to the good one from Joolah (which is simpler and with fewer dependencies). – Reducer 10/4, 2020 at 14:6

can do using Selenium

from selenium import webdriver

DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('https://www.spotify.com')
screenshot = driver.save_screenshot('my_screenshot.png')
driver.quit()

https://sites.google.com/a/chromium.org/chromedriver/getting-started

Sulfa answered 31/5, 2018 at 3:0 Comment(6)

this is nice and quick. is there a way to get the full page? currently, only the top portion of the page will be saved. E.g., if a page can be scrolled to the bottom, the above will only get the result of scrolling all the way up. – Aurify 14/1, 2020 at 0:44

@Aurify You can scroll the webpage using driver.execute_script("window.scrollTo(0, Y)"). Where 'Y' is the screen height. You may set screenshot = driver.save_screenshot('my_screenshot.png') and the above code in a loop until your full webpage gets covered. I am not that sure about this but this logically sounds fine to me. – Jot 3/2, 2020 at 6:31

@Aurify You can also do driver.execute_script('document.body.style.zoom = "50%"') – Reducer 10/4, 2020 at 14:17

do we need to have Chrome installed? – Teahouse 9/5, 2020 at 15:40

@Teahouse yes you do need chrome installed. – Authority 10/6, 2020 at 2:13

As an aside to this, I made a small wrapper library around Selenium that streamlines the process - github.com/wirelessfuture/pywebcapture - it gets the total scroll height of the page – Soupy 28/7, 2020 at 22:5

On the Mac, there's webkit2png and on Linux+KDE, you can use khtml2png. I've tried the former and it works quite well, and heard of the latter being put to use.

I recently came across QtWebKit which claims to be cross platform (Qt rolled WebKit into their library, I guess). But I've never tried it, so I can't tell you much more.

The QtWebKit links shows how to access from Python. You should be able to at least use subprocess to do the same with the others.

Gratulation answered 29/7, 2009 at 1:12 Comment(1)

khtml2png is outdated according to the website, python-webkit2png is recommended by them. – Cuneo 10/6, 2017 at 14:32

11 years later...

Taking a website screenshot using Python3.6 and Google PageSpeedApi Insights v5:

import base64
import requests
import traceback
import urllib.parse as ul

# It's possible to make requests without the api key, but the number of requests is very limited  

url = "https://duckgo.com"
urle = ul.quote_plus(url)
image_path = "duckgo.jpg"

key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
strategy = "desktop" # "mobile"
u = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key={key}&strategy={strategy}&url={urle}"

try:
    j = requests.get(u).json()
    ss_encoded = j['lighthouseResult']['audits']['final-screenshot']['details']['data'].replace("data:image/jpeg;base64,", "")
    ss_decoded = base64.b64decode(ss_encoded)
    with open(image_path, 'wb+') as f:
        f.write(ss_decoded) 
except:
    print(traceback.format_exc())
    exit(1)

Notes:

Live Demo
Pros: Free
Cons: Low Resolution
Get API Key
Docs
Limits:
- Queries per day = 25,000
- Queries per 100 seconds = 400

Delilahdelimit answered 17/9, 2020 at 0:30 Comment(3)

Works, thanks a lot! However, it seems to be slow? – Shelleyshellfire 8/2, 2021 at 18:56

This is convenient. But the screenshot is too short. And look like the height cannot be changed – Geisler 19/10, 2021 at 4:14

@OlegO It's probably slow because lighthouse also measures other page parameters, the screenshot if just one of them. – Delilahdelimit 17/6, 2023 at 12:24

Using Rendertron is an option. Under the hood, this is a headless Chrome exposing the following endpoints:

/render/:url: Access this route e.g. with requests.get if you are interested in the DOM.
/screenshot/:url: Access this route if you are interested in a screenshot.

You would install rendertron with npm, run rendertron in one terminal, access http://localhost:3000/screenshot/:url and save the file, but a demo is available at render-tron.appspot.com making it possible to run this Python3 snippet locally without installing the npm package:

import requests

BASE = 'https://render-tron.appspot.com/screenshot/'
url = 'https://google.com'
path = 'target.jpg'
response = requests.get(BASE + url, stream=True)
# save file, see https://mcmap.net/q/73859/-how-to-download-image-using-requests
if response.status_code == 200:
    with open(path, 'wb') as file:
        for chunk in response:
            file.write(chunk)

Knifeedged answered 11/3, 2019 at 20:46 Comment(2)

I like this answer a lot due to its potential, but the documentation on rendertron is pretty poor, so it's difficult to figure out how to use it beyond just your example here. what would an actual, working example look like? Say for someone that just installed rendertron and wants to screenshot this page here? – Godhood 13/2, 2020 at 9:55

Like mentioned above, after you've installed rendertron, you would call rendertron on a terminal, then it should listen on port 3000. Then, a screenshot of this very page should be available at localhost:3000/screenshot/https://stackoverflow.com/questions/…. You can check that by browsing there with your favorite browser, and the code snippet in my answer basically just stores that image to the drive. Of course, you'd have to replace BASE = 'http://localhost:3000/screenshot/' and url = 'https://stackoverflow.com/questions/1197172'. – Knifeedged 17/2, 2020 at 21:22

I can't comment on ars's answer, but I actually got Roland Tapken's code running using QtWebkit and it works quite well.

Just wanted to confirm that what Roland posts on his blog works great on Ubuntu. Our production version ended up not using any of what he wrote but we are using the PyQt/QtWebKit bindings with much success.

Note: The URL used to be: http://www.blogs.uni-osnabrueck.de/rotapken/2008/12/03/create-screenshots-of-a-web-page-using-python-and-qtwebkit/ I've updated it with a working copy.

Nonperformance answered 29/7, 2009 at 4:19 Comment(2)

Cool. I think that's the lib I'll try the next time I need something like this. – Gratulation 29/7, 2009 at 4:48

We ended up putting a RabbitMQ server on top of it and building some code the control the Xvfb servers and the processes running in them to pseudo-thread the screenshots being built. It runs decently fast with an acceptable amount of memory usage. – Nonperformance 29/7, 2009 at 4:52

This is an old question and most answers are a bit dated. Currently, I would do 1 of 2 things.

1. Create a program that takes the screenshots

I would use Pyppeteer to take screenshots of websites. This runs on the Puppeteer package. Puppeteer spins up a headless chrome browser, so the screenshots will look exactly like they would in a normal browser.

This is taken from the pyppeteer documentation:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://example.com')
    await page.screenshot({'path': 'example.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

2. Use a screenshot API

You could also use a screenshot API such as this one. The nice thing is that you don't have to set everything up yourself but can simply call an API endpoint.

This is taken from the screenshot API's documentation:

import urllib.parse
import urllib.request
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

# The parameters.
token = "YOUR_API_TOKEN"
url = urllib.parse.quote_plus("https://example.com")
width = 1920
height = 1080
output = "image"

# Create the query URL.
query = "https://screenshotapi.net/api/v1/screenshot"
query += "?token=%s&url=%s&width=%d&height=%d&output=%s" % (token, url, width, height, output)

# Call the API.
urllib.request.urlretrieve(query, "./example.png")

Dray answered 18/8, 2020 at 10:12 Comment(0)

Using a web service s-shot.ru (so it's not so fast), but quite easy to set up what need through the link configuration. And you can easily capture full page screenshots

import requests
import urllib.parse

BASE = 'https://mini.s-shot.ru/1024x0/JPEG/1024/Z100/?' # you can modify size, format, zoom
url = 'https://stackoverflow.com/'#or whatever link you need
url = urllib.parse.quote_plus(url) #service needs link to be joined in encoded format
print(url)

path = 'target1.jpg'
response = requests.get(BASE + url, stream=True)

if response.status_code == 200:
    with open(path, 'wb') as file:
        for chunk in response:
            file.write(chunk)

Thickleaf answered 12/12, 2019 at 7:43 Comment(1)

Awesome, tried a lot of the single code block answers, this was the first one that worked for me on Ubuntu 20.x. – Sholes 6/4, 2022 at 13:29

You can use Google Page Speed API to achieve your task easily. In my current project, I have used Google Page Speed API`s query written in Python to capture screenshots of any Web URL provided and save it to a location. Have a look.

import urllib2
import json
import base64
import sys
import requests
import os
import errno

#   The website's URL as an Input
site = sys.argv[1]
imagePath = sys.argv[2]

#   The Google API.  Remove "&strategy=mobile" for a desktop screenshot
api = "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=" + urllib2.quote(site)

#   Get the results from Google
try:
    site_data = json.load(urllib2.urlopen(api))
except urllib2.URLError:
    print "Unable to retreive data"
    sys.exit()

try:
    screenshot_encoded =  site_data['screenshot']['data']
except ValueError:
    print "Invalid JSON encountered."
    sys.exit()

#   Google has a weird way of encoding the Base64 data
screenshot_encoded = screenshot_encoded.replace("_", "/")
screenshot_encoded = screenshot_encoded.replace("-", "+")

#   Decode the Base64 data
screenshot_decoded = base64.b64decode(screenshot_encoded)

if not os.path.exists(os.path.dirname(impagepath)):
    try:
        os.makedirs(os.path.dirname(impagepath))
        except  OSError as exc:
            if exc.errno  != errno.EEXIST:
                raise

#   Save the file
with open(imagePath, 'w') as file_:
    file_.write(screenshot_decoded)

Unfortunately, following are the drawbacks. If these do not matter, you can proceed with Google Page Speed API. It works well.

The maximum width is 320px
According to Google API Quota, there is a limit of 25,000 requests per day

Krystlekrystyna answered 4/11, 2019 at 3:39 Comment(0)

You don't mention what environment you're running in, which makes a big difference because there isn't a pure Python web browser that's capable of rendering HTML.

But if you're using a Mac, I've used webkit2png with great success. If not, as others have pointed out there are plenty of options.

Zarla answered 28/7, 2009 at 23:28 Comment(0)

I created a library called pywebcapture that wraps selenium that will do just that:

pip install pywebcapture

Once you install with pip, you can do the following to easily get full size screenshots:

# import modules
from pywebcapture import loader, driver

# load csv with urls
csv_file = loader.CSVLoader("csv_file_with_urls.csv", has_header_bool, url_column, optional_filename_column)
uri_dict = csv_file.get_uri_dict()

# create instance of the driver and run
d = driver.Driver("path/to/webdriver/", output_filepath, delay, uri_dict)
d.run()

Enjoy!

https://pypi.org/project/pywebcapture/

Soupy answered 28/7, 2020 at 22:12 Comment(0)

-1

Try this..

#!/usr/bin/env python

import gtk.gdk

import time

import random

while 1 :
    # generate a random time between 120 and 300 sec
    random_time = random.randrange(120,300)

    # wait between 120 and 300 seconds (or between 2 and 5 minutes)
    print "Next picture in: %.2f minutes" % (float(random_time) / 60)

    time.sleep(random_time)

    w = gtk.gdk.get_default_root_window()
    sz = w.get_size()

    print "The size of the window is %d x %d" % sz

    pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1])
    pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1])

    ts = time.time()
    filename = "screenshot"
    filename += str(ts)
    filename += ".png"

    if (pb != None):
        pb.save(filename,"png")
        print "Screenshot saved to "+filename
    else:
        print "Unable to get the screenshot."

Mickey answered 18/11, 2013 at 10:9 Comment(0)

-1

import subprocess

def screenshots(url, name):
    subprocess.run('webkit2png -F -o {} {} -D ./screens'.format(name, url), 
      shell=True)

Merchantman answered 19/3, 2020 at 14:54 Comment(2)

Welcome to Stack Overflow! To make your answer stand out, it would be great to add some explanation of your approach (e.g. what are all of those parameters to webkit2png?) and links to documentation. – Heaven 19/3, 2020 at 15:15

webkit2png is not installed by default – Olvan 24/9, 2020 at 15:20

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags