Form Request Using Scrapy + Splash
Asked Answered
H

1

7

I am trying to login to a website using the following code (slightly modified for this post):

import scrapy
from scrapy_splash import SplashRequest
from scrapy.crawler import CrawlerProcess

class Login_me(scrapy.Spider):
    name = 'espn'
    allowed_domains = ['games.espn.com']
    start_urls = ['http://games.espn.com/ffl/leaguerosters?leagueId=774630']

    def start_requests(self):
        script = """
        function main(splash)
                local url = splash.args.url

                assert(splash:go(url))
                assert(splash:wait(10))

                local search_input = splash:select('input[type=email]')   
                search_input:send_text("user email")

                local search_input = splash:select('input[type=password]')
                search_input:send_text("user password!")

                assert(splash:wait(10))
                local submit_button = splash:select('input[type=submit]')
                submit_button:click()

                assert(splash:wait(10))

                return html = splash:html()
              end
            """

        yield SplashRequest(
            'http://games.espn.com/ffl/leaguerosters?leagueId=774630',
            callback=self.after_login,
            endpoint='execute',
            args={'lua_source': script}
            )
        def after_login(self, response):
            table = response.xpath('//table[@id="playertable_0"]')
            for player in table.css('tr[id]'):
                 item = {
                         'id': player.css('::attr(id)').extract_first(),
                        }    
                 yield item
            print(item)

I am getting the error:

<GET http://games.espn.com/ffl/signin?redir=http%3A%2F%2Fgames.espn.com%2Fffl%2Fleaguerosters%3FleagueId%3D774630> from <GET http://games.espn.com/ffl/leaguerosters?leagueId=774630>
2018-12-14 16:49:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://games.espn.com/ffl/signin?redir=http%3A%2F%2Fgames.espn.com%2Fffl%2Fleaguerosters%3FleagueId%3D774630> (referer: None)
2018-12-14 16:49:04 [scrapy.core.scraper] ERROR: Spider error processing <GET http://games.espn.com/ffl/signin?redir=http%3A%2F%2Fgames.espn.com%2Fffl%2Fleaguerosters%3FleagueId%3D774630> (referer: None)

I am still not able to login, for some reason. I have bounced around many different posts on here, and have tried many different variation of "splash:select", but I can't seem to find my issue. When i inspect the webpage with chrome, I see this (with a similar html for the password):

 <input type="email" placeholder="Username or Email Address" autocapitalize="none" autocomplete="on" autocorrect="off" spellcheck="false" ng-model="vm.username" 
ng-pattern="/^[^<&quot;>]*$/" ng-required="true" did-disable-validate="" ng-focus="vm.resetUsername()" class="ng-pristine ng-invalid ng-invalid-required 
ng-valid-pattern ng-touched" tabindex="0" required="required" aria-required="true" aria-invalid="true">

The above html, I believe is written in JS though. So I am not able to grab it with Scrapy, so, I viewed the source of the page and I think the relevant JS code to use with Splash is this (not sure though):

function authenticate(params) {
        return makeRequest('POST', '/guest/login', {
            'loginValue': params.loginValue,
            'password': params.password
        }, {
            'Authorization': params.authorization,
            'correlation-id': params.correlationId,
            'conversation-id': params.conversationId,
            'oneid-reporting': buildReportingHeader(params.reporting)
        }, {
            'langPref': getLangPref()
        });
    }

Can someone nudge me in the right direction?

Haroldson answered 14/12, 2018 at 22:56 Comment(0)
R
0

The main problem here is that the login form is inside an iframe element. I do not know scrapy_splash, so below POC code uses selenium and beautiful soup. But the mechanism will be similar with splash, you need to switch to the iframe and then back when id disappears.

import os
from bs4 import BeautifulSoup
from selenium import webdriver

USER = 'theUser'
PASS = 'thePassword'

fp = webdriver.FirefoxProfile()
driver = webdriver.Firefox(fp)
driver.get('http://games.espn.com/ffl/leaguerosters?leagueId=774630')

iframe = driver.find_element_by_css_selector('iframe#disneyid-iframe')
driver.switch_to.frame(iframe)
driver.find_element_by_css_selector("input[type='email']").send_keys(USER)
driver.find_element_by_css_selector("input[type='password']").send_keys(PASS)
driver.find_element_by_css_selector("button[type='submit']").click()

driver.switch_to.default_content()
soup_level1 = BeautifulSoup(driver.page_source, 'html.parser')

For this code to work you need to have firefox and geckodriver installed and in the path, and compatible version.

Rowdyism answered 20/12, 2018 at 19:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.