Scrapy, how to change value in input form, submit and then scrape page
Asked Answered
D

1

7

I want to input a value into a text input field and then submit the form and after the form submit scrape the new data on the page How is this possible?

this is the html form on the page. I want to change the input value from 10 to 100 and submit the form

<form action="https://de.iss.fst.com/ba-u6-72-nbr-902-112-x-140-x-13-12-mm-simmerringr-ba-a-mit-feder-fst-40411416#product-offers-anchor" method="post" _lpchecked="1">
            <div class="fieldset">
               <div class="field qty">
                  <div class="control">
                        <label class="label" for="qty-2">
                           <span>Preise für</span>
                        </label>
                        <input type="text" name="pieces" class="validate-length maximum-length-10 qty" maxlength="12" id="qty-2" value="10">
                        <label class="label" for="qty-2">
                           <span>Teile</span>
                        </label>
                        <span class="actions">
                           <button type="submit" title="Absenden" class="action">
                              <span>Absenden</span>
                           </button>
                        </span>
                  </div>
               </div>
            </div>
      </form>

Update! New working code.

import scrapy
import pymongo
from scrapy_splash import SplashRequest, SplashFormRequest
from issfst.items import IssfstItem


class IssSpider(scrapy.Spider):
    name = "issfst_spider"
    start_urls = ["https://de.iss.fst.com/dichtungen/radialwellendichtringe/rwdr-mit-geschlossenem-kafig/ba"]
    custom_settings = {
        # specifies exported fields and order
        'FEED_EXPORT_FIELDS': ["imgurl",
                               "Produktdatenblatt",
                               "Materialdatenblatt",]
    }

    def parse(self, response):
        self.log("I just visted:" + response.url)
        urls = response.css('.details-button > a::attr(href)').extract()

        for url in urls:
            formdata = {'pieces': '200'}
            yield SplashFormRequest.from_response(
                response,
                url=url,
                formdata=formdata,
                callback=self.parse_details,
                args={'wait': 3}
            )

        # follow paignation link
        next_page_url = response.css('li.item  > a.next::attr(href)').extract_first()
        if next_page_url:
            next_page_url = response.urljoin(next_page_url)
            yield scrapy.Request(url=next_page_url, callback=self.parse)

    def parse_details(self, response):
        item = IssfstItem()
        # scrape image url
        item['imgurl'] = response.css('img.fotorama__img::attr(src)').extract(),
        # scrape download pdf links
        item['Produktdatenblatt'] = response.css('a.action[data-group="productdatasheet"]::attr(href)').extract_first(),
        item['Materialdatenblatt'] = response.css( 'a.action[data-group="materialdatasheet"]::attr(href)').extract_first(),
        item['Beschreibung'] = response.css('.description > p::text').extract_first(),
        yield item
Dissemble answered 5/9, 2019 at 15:19 Comment(0)
B
2

You shouldn't refer to the html source code to know the names of parameters of a POST request. You should use the developer tool of your favorite browser and look at the network while conserving the logs.

So, you are looking for the url https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424#product-offers-anchor and to POST with the parameters pieces and form_key.

Firefox's developer tool (FR version) to see POST request and its parameters

You make an error when you set the form data with the wrong name 'value' while the website expects the name 'pieces'.

Now, as demo in a scrapy shell session:

scrapy shell "https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424"
... 
from scrapy import FormRequest

##SETTING POST'S PARAMETERS
form_key = response.css('[name="form_key"]::attr(value)').get()
#Note response.xpath('input[@name="form_key"]/@value') returns nothing
#as far as I know for hidden element like this, css selection is the basic solution
pieces = "100"
form_data = {'form_key':form_key,'pieces':pieces} #with the correct names

##POST THE REQUEST
fetch(
     FormRequest(
    'https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424#product-offers-anchor',
    formdata=form_data)
)#note the add of '#product-offers-anchor' to the url, instead it won't work
view(response) #to see the page your default browser

Now you can adapt the above to your code.

Broadcaster answered 8/9, 2019 at 10:13 Comment(3)
Thanks! I figured this out myself last week :) I've updated my post with the working code. Maybe this will help someoneDissemble
@Dissemble I am surprised you don't need the form_key parameter in the form. If you don't need that's ok, just know sometimes all parameters must be set in the form to get the right page's response. It could avoid a future new interrogation for you.Broadcaster
oh ok, then I will implement the form_key parameter Thanks!Dissemble

© 2022 - 2024 — McMap. All rights reserved.