I'm using selenium to make a headless scraping of a website within an endpoint of an API using Flask for Python. I made several tests and my selenium scraping code works perfectly within a script and while running as an API in the localhost. However, when I deploy the code in a remote server, the requests always return a 502 Bad Gateway error. It is weird because by logging I can see that the scraping is working correctly, but the server responds with 502 before the scraping finish processing, as if it was trying to set up a proxy and it fails. I also noticed that removing the time.sleep
in my code makes it return a 200 although the result could be wrong because it doesn't give selenium the proper time to load the all the page to scrape.
I also tried to set up to use falcon instead of flask and I get a similar error. This is a sample of my recent code using Falcon:
class GetUrl(object):
def on_get(self, req, resp):
"""
Get Request
:param req:
:param resp:
:return:
"""
# read parameter
req_body = req.bounded_stream.read()
json_data = json.loads(req_body.decode('utf8'))
url = json_data.get("url")
# get the url
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(firefox_options=options)
driver.get(url)
time.sleep(5)
result = False
# check for outbound links
content = driver.find_elements_by_xpath("//a[@class='_52c6']")
if len(content) > 0:
href = content[0].get_attribute("href")
result = True
driver.quit()
# make the return
return_doc = {"result": result}
resp.body = json.dumps(return_doc, sort_keys=True, indent=2)
resp.content_type = 'text/string'
resp.append_header('Access-Control-Allow-Origin', "*")
resp.status = falcon.HTTP_200
I saw some other similar issues like this, but even though I can see that there is a gunicorn
running in my server, I don't have nginx, or at least it is not running where it should running. And I don't think Falcon uses it. So, what exactly am I doing wrong? Some light in this issue is highly appreciated, thank you!
time.sleep
. It will wait only as long as required, which will likely speed things up. – Gandhi