Scrapy splash not working correctly when searching for items loaded with JS
Asked Answered
A

1

6

I'm using scrapy with scrapy splash to get data from some URLs such as this product url or this product url 2.

I have a Lua Script with a wait time and return the HTML:

script = """
            function main(splash)
              assert(splash:go(splash.args.url))
              assert(splash:wait(4))
              return splash:html()
            end
"""

then i execute it.

yield SplashRequest(url, self.parse_item, args={'lua_source': script},endpoint='execute')

From here I need 3 elements, they are the 3 different product prices The 3 are loaded with JS.

prices

I have the xpath to get the 3 elements. But the problem is that sometimes it works and sometimes it doesn't work

    price_strikethrough = response.xpath('//div[@class="price-selector"]/div[@class="prices"]/span[contains(@class,"active-price strikethrough")]/span[1]/text()').extract_first() 
    price_offer1 = response.xpath('//div[@class="price-selector"]/div[@class="prices"]/div[contains(@class,"precioDescuento")][1]/text()').extract_first()
    price_offer2 = response.xpath('//div[@class="price-selector"]/div[@class="prices"]/div[contains(@class,"precioDescuento")][2]/text()').extract_first()

I don't know what else to do to make it work properly. I have tried changing the wait values, but it is the same. Sometimes it works fine, sometimes I don't get the data. How could I make sure I always get the data I need?

Aliciaalick answered 11/1, 2020 at 7:0 Comment(1)
The website has an incredible long response time, I tried a few times and the average is around 3s. Would you retry after changing the wait time >5 ?Seguidilla
H
0

There is nothing wrong with your approach but the issue seems to be on the website. It is taking a variable time for calculating prices by the site, you need to update the time in your lua_script it should be around 7 to 8 seconds.

Hildredhildreth answered 16/1, 2020 at 12:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.