Running selenium browser on server (Flask/Python/Heroku)
Asked Answered
R

2

17

I am scraping some websites that seem to have pretty good protection against it. The only way I can get it to work is to use Selenium to load the page and then scrape stuff from that.

Currently this works on my local computer (a firefox windows opens and closed when I access my page and it's HTML is processed further in my script). However, I need my scraper to be accessible on the web. The scraper is embedded within a Flask app on Heroku. Is there a way to make the Selenium browser work on Heroku servers? Or are there any hosting providers where it can work?

Roomful answered 9/4, 2013 at 14:1 Comment(0)
A
17

Heroku, wonderful as it is, has a major limitation in that one cannot use custom software or in many cases, libraries. In providing an easy to use, centrally-controlled, managed stack, Heroku strips their servers down to prevent other usage.

What this boils down to is there is no Xorg on a Heroku dyno. Lack of Xorg and lack of ability to install custom software means no xvfb either, and no ability to run the browser that selenium expects to exist. Further, the browser is not generally available.

You'll have better luck with a cloud offering like AWS, where you can install custom software, including firefox, xvfb (to keep from needing all the Xorg overhead), and of course the rest of your scraping stack. This answer explains how to do it properly.

Aa answered 3/9, 2013 at 23:41 Comment(0)
A
7

There are buildpacks to make selenium work on heroku.

Add below buildpacks.

1) heroku buildpacks:add https://github.com/kevinsawicki/heroku-buildpack-xvfb-google-chrome/
2) heroku buildpacks:add https://github.com/heroku/heroku-buildpack-chromedriver

And set heroku stack to cedar-14 as shown below, as xvfb buildpack works only with cedar-14.

heroku stack:set cedar-14 -a stocksdata

Then point the google chrome location as below

options = ChromeOptions()
options.binary_location = "/app/.apt/usr/bin/google-chrome-stable"
driver = webdriver.Chrome(chrome_options=options)
Abbreviated answered 11/11, 2017 at 10:12 Comment(1)
Your advice worked for me. In my case I manually uploaded chromedriver into applications bin/ directory and used heroku buildpacks:add https://github.com/heroku/heroku-buildpack-google-chrome instead of xvfb for headless mode.Dodgem

© 2022 - 2024 — McMap. All rights reserved.