Scrapy Splash click button doesn't work
Asked Answered
A

2

5

What I'm trying to do

On avito.ru (Russian real estate site), person's phone is hidden until you click on it. I want to collect the phone using Scrapy+Splash.

Example URL: https://www.avito.ru/moskva/kvartiry/2-k_kvartira_84_m_412_et._992361048

screenshot: Phone is hidden

After you click the button, pop-up is displayed and phone is visible.

enter image description here

I'm using Splash execute API with following Lua script:

function main(splash)
    splash:go(splash.args.url)
    splash:wait(10)
    splash:runjs("document.getElementsByClassName('item-phone-button')[0].click()")
    splash:wait(10)
    return splash:png()
end

Problem

The button is not clicked and phone number is not displayed. It's a trivial task, and I have no explanation why it doesn't work.

Click works fine for another field on the same page, if we replace item-phone-button with js-show-stat. So Javascript in general works, and the blue "Display phone" button must be special somehow.

What I've tried

To isolate the problem, I created a repo with minimal example script and a docker-compose file for Splash: https://github.com/alexanderlukanin13/splash-avito-phone

Javascript code is valid, you can verify it using Javascript console in Chrome and Firefox

document.getElementsByClassName('item-phone-button')[0].click()

I've tried it with Splash versions 3.0, 3.1, 3.2, result is the same.

Update

I've also tried:

Aeolotropic answered 14/3, 2018 at 11:19 Comment(0)
S
11

The following script works for me:

function main(splash, args)
  splash.private_mode_enabled = false
  assert(splash:go(args.url))
  btn = splash:select_all('.item-phone-button')[2]
  btn:mouse_click()
  btn.style.border = "5px solid black"
  assert(splash:wait(0.5))
  return {
    num = #splash:select_all('.item-phone-button'),
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

There were 2 issues with the original solution:

  1. There are 2 elements with 'item-phone-button' class, and button of interest is the second one. I've checked which element is matched by setting btn.style.border = "5px solid black".
  2. This website requires private mode to be disabled, likely because it uses localStorage. Check http://splash.readthedocs.io/en/stable/faq.html#website-is-not-rendered-correctly for other common suggestions.
Smack answered 20/3, 2018 at 14:55 Comment(1)
Actually, just adding splash.private_mode_enabled = false to the original script does the job. Thanks Mike!Aeolotropic
M
1

I don't know how your implementation works, but I suggest to rename main with parse, the default function called by spiders on start.

If this isn't the problem, first thing to do is controlling if you have picked the right element of that class using Javascript with css selector. Maybe it exists another item with item-phone-button class attribute and you are clicking in the wrong place.

If all above is correct, I suggest then two options that worked for me:

  • Using Splash mouse_click and Splash wait (the latter I see you have already used). If it don't work, try double click, by substituting in your code:
    local button = splash:select('item phone-button') 
    button:mouse_click()
    button:mouse_click()
    

  • Using Splash wait_for_resume, that executes javascript code until terminated and then restart LUA. Your code will become simpler too:
    function main(splash)
        splash:go(splash.args.url)
        splash:wait_for_resume("document.getElementsByClassName([[
                      function main(splash) {
                           document.getElementsByClassName('item-phone-button');[0].click()
                           splash.resume();
                      }               
        ]])
        return splash:png()
    end
    

    EDIT: it seems that is good to use dispatchEvent instead of click() like in this example:

    function simulateClick() {
      var event = new MouseEvent('click', {
        view: window,
        bubbles: true,
        cancelable: true
      });
      var cb = document.getElementById('checkbox'); 
      var cancelled = !cb.dispatchEvent(event);
      if (cancelled) {
        // A handler called preventDefault.
        alert("cancelled");
      } else {
        // None of the handlers called preventDefault.
        alert("not cancelled");
      }
    }
    
  • Malvina answered 14/3, 2018 at 13:55 Comment(3)
    Thank you for your answer, it makes sense and I upvote it. Unfortunately, non of these advises works in my case. I suspect that there is a compatibility issue between Splash's Webkit version and this particular site.Aeolotropic
    thank you. take a look here developer.mozilla.org/en-US/docs/Web/Guide/Events/…Malvina
    FTR, splash:mouse_click is better than any JS-based click function (via MouseEvent, etc.), because it sends a real mouse click event to the browser window.Smack

    © 2022 - 2024 — McMap. All rights reserved.