Hidden parts in html source code while scraping (python)
Asked Answered
R

2

7

So I want to scrape the 'Buy price' integer from this url: https://rsbuddy.com/exchange?id=5502

But when I look at the source code, I can't reach those prices. Neither does BeautifulSoup scraper. This is the ouput from BeautifulSoup:

<div class="col-md-7" id="buy-price">
    ---
</div>

But when I 'inspect element' using chrome, i actually am able to see that price:

<div id="buy-price" class="col-md-7">29,990 gp</div>

Why is that part of the code 'hidden'? Is it simply because they don't want people to scrape from their website? Is there a way to get around this?

Thanks in advance

EDIT: I found the answer by tracking the javascript traffic using the chrome tools. Apparently even though api.rsbuddy.com doesn't give you anything, it does use the api: https://api.rsbuddy.com/grandExchange?a=guidePrice&i=5502

Rozalie answered 3/7, 2015 at 15:0 Comment(0)
C
1

The prices are presumably being put in there by JavaScript. Likely they're using some sort of AJAX to get the prices. You'll have to investigate their JavaScript to get the data you want.

Just to clarify, it's not "hidden" per se, it's just not in the HTML. When you do inspect element it looks at the document consisting of the HTML where it started and any changes the JavaScript has made to it.

Corin answered 3/7, 2015 at 15:2 Comment(3)
I see, that explains, thank you. So do you think I would be able to find those prices if I would dig deeper into the JS code?Rozalie
Well, I don't think you're going to find the prices in the JS code. Assuming I'm correct and the JS is using AJAX, if you dig deep enough you should be able to find what http address they're coming from (and said address may be automatically generated based on the item name/id/w/e).Corin
Thanks. I asked the devs to help me out with it. If anyone would know whether it is possible it would be them :)Rozalie
U
3

If certain parts of the page are inserted via JavaScript your best bet is to use something like selenium with PhantomJS as the driver.

The Python binding's are quite easy to use and this will allow the JavaScript to execute in the browser and you can grab the prices from there.

Let me know if you want any more information and I'd be happy to help.

Uncoil answered 3/7, 2015 at 15:19 Comment(0)
C
1

The prices are presumably being put in there by JavaScript. Likely they're using some sort of AJAX to get the prices. You'll have to investigate their JavaScript to get the data you want.

Just to clarify, it's not "hidden" per se, it's just not in the HTML. When you do inspect element it looks at the document consisting of the HTML where it started and any changes the JavaScript has made to it.

Corin answered 3/7, 2015 at 15:2 Comment(3)
I see, that explains, thank you. So do you think I would be able to find those prices if I would dig deeper into the JS code?Rozalie
Well, I don't think you're going to find the prices in the JS code. Assuming I'm correct and the JS is using AJAX, if you dig deep enough you should be able to find what http address they're coming from (and said address may be automatically generated based on the item name/id/w/e).Corin
Thanks. I asked the devs to help me out with it. If anyone would know whether it is possible it would be them :)Rozalie

© 2022 - 2024 — McMap. All rights reserved.