Click display button in Scrapy-Splash

Asked 25/6, 2019 at 16:6 Answered 3/7, 2019 at 22:8

Solved python web-scraping scrapy splash-screen scrapy-splash

I am scraping the following webpage using scrapy-splash, http://www.starcitygames.com/buylist/, which I have to login to, to get the data I need. That works fine but in order to get the data I need to click the display button so I can scrape that data, the data I need is not accessible until the button is clicked. I already got an answer to this that told me I cannot simply click the display button and scrape the data that shows up and that I need to scrape the JSON webpage associated with that information but I am concerned that scraping the JSON instead will be a red flag to the owners of the site since most people do not open the JSON data page and it would take a human several minutes to find it versus the computer which would be much faster. So I guess my question is, is there anyway to scrape the webpage my clicking display and going from there or do I have no choice but to scrape the JSON page? This is what I have got so far... but it is not clicking the button.

import scrapy
from ..items import NameItem

class LoginSpider(scrapy.Spider):
    name = "LoginSpider"
    start_urls = ["http://www.starcitygames.com/buylist/"]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
        response,
        formcss='#existing_users form',
        formdata={'ex_usr_email': '[email protected]', 'ex_usr_pass': 'password'},
        callback=self.after_login
        )



    def after_login(self, response):
        item = NameItem()
        display_button = response.xpath('//a[contains(., "Display>>")]/@href').get()

        yield response.follow(display_button, self.parse)

        item["Name"] = response.css("div.bl-result-title::text").get()
        return item

Fernandefernandel answered 25/6, 2019 at 16:6 Comment(2)

If you are not interested in json response then go for any browser simulator like selenium to click on that button and parse the result the way you see in that webpage. Splash might be the best option but I'm not familiar with that yet so, I can't tell you for sure. – Sick 25/6, 2019 at 18:10

I don't see splash anywhere in your code? You mentioned splash but are not using it anywhere? If you follow the article blog.scrapinghub.com/2015/03/02/…, you will find what you need is a very simple case. The only thing is that you are using the normal scrapy request object and not the SplashRequest object – Pretermit 27/6, 2019 at 18:38

You can use the developer tools of your browser to track the request of that click event, which is in a nice JSON format, also no need for cookie (login):

http://www.starcitygames.com/buylist/search?search-type=category&id=5061

The only thing need to fill is the category_id related to this request, this can be extracted from the HTML and declared in your code.

Category name:

//*[@id="bl-category-options"]/option/text()

Category id:

//*[@id="bl-category-options"]/option/@value

Working with JSON is much more simple than parsing HTML.

Glossematics answered 28/6, 2019 at 4:44 Comment(1)

See also docs.scrapy.org/en/latest/topics/dynamic-content.html – Lafond 20/11, 2019 at 8:36

I have tried to emulate the click with scrapy-splash, making use of lua script. It works, you just have to integrate it with scrapy and to manipulate the content. I leave the script, in which I finish integrating it with scrapy.

function main(splash)
  local url = 'https://www.starcitygames.com/login'
  assert(splash:go(url))
  assert(splash:wait(0.5))
  assert(splash:runjs('document.querySelector("#ex_usr_email_input").value = "[email protected]"'))
  assert(splash:runjs('document.querySelector("#ex_usr_pass_input").value = "your_password"'))
  splash:wait(0.5)
  assert(splash:runjs('document.querySelector("#ex_usr_button_div button").click()'))
  splash:wait(3)
  splash:go('https://www.starcitygames.com/buylist/')
  splash:wait(2)
  assert(splash:runjs('document.querySelectorAll(".bl-specific-name")[1].click()'))
  splash:wait(1)
  assert(splash:runjs('document.querySelector("#bl-search-category").click()'))
  splash:wait(3)
  splash:set_viewport_size(1200,2000)
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

Acidfast answered 3/7, 2019 at 22:8 Comment(0)

Recommended topics

Hot tags