How to call JavaScript function using BeautifulSoup and Python
Asked Answered
C

3

12

I am performing web scraping to grab data from a website as part of my project. I can make the request and grab the data which is present in the dom. However, some data is getting rendered on javascript onClick function.

One way could be, using the selenium to click on the link (which calls the javascript function) and grab the rendered data, but this process is time-consuming, and I don't want to open the browser.

Is there any way other than selenium to achieve this?

Website: http://catalog.fullerton.edu/preview_entity.php?catoid=16&ent_oid=1849

In the courses section of this webpage, all the courses are hyperlinks, and as soon as someone clicks on the courses, a javascript method gets called. I need the data which gets rendered after the javascript function call.

Coyne answered 4/2, 2018 at 0:0 Comment(0)
G
6

You can't. If you want to run JavaScript, you'll need to use a headless browser. Otherwise, you'll have to disassemble the JavaScript and see what it does.

Click on the element while your browser's developer tools are open in the Network tab:

enter image description here

You can now see that the JavaScript downloads new HTML from that URL. You can easily send the same request with urllib.

Gamic answered 4/2, 2018 at 0:4 Comment(0)
P
6

You can use https://pypi.org/project/requests-html/ this library to render JavaScript content and then use beautiful soup to parse it.

Example:

from requests_html import HTMLSession  
  
def render_JS(URL):
    session = HTMLSession()
    r = session.get(URL)
    r.html.render()
    return r.html.text
Poundfoolish answered 1/2, 2020 at 16:36 Comment(0)
G
4

You can't do this using BeautifulSoup alone. This module was created to scrape HTML (Hyper Text Markup Language) not JavaScript, CSS or any other web language.

It can extract between <script></script> tags (which will be quite useful) but beyond this BeautifulSoup is not what you need.

To call a JavaScript functions you will need a headless browser such as PhantomJS or Selenium. There have also been attempts to parse JavaScript as well as using regex (which is not a good idea) and using other methods (recommended) some methods are described in this question and may be useful.

Glaudia answered 4/2, 2018 at 0:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.