Python Scrapy not always downloading data from website

About

Asked 29/11, 2013 at 15:55 Answered 31/1, 2014 at 11:57

Scrapy is used to parse an html page. My question is why sometimes scrapy returns the response I want, but sometimes does not return a response. Is it my fault? Here's my parsing function:

class AmazonSpider(BaseSpider):
    name = "amazon"
    allowed_domains = ["amazon.org"]
    start_urls = [
       "http://www.amazon.com/s?rh=n%3A283155%2Cp_n_feature_browse-bin%3A2656020011"
   ]

def parse(self, response):
            sel = Selector(response)
            sites = sel.xpath('//div[contains(@class, "result")]')
            items = []
            titles = {'titles': sites[0].xpath('//a[@class="title"]/text()').extract()}
            for title in titles['titles']:
                item = AmazonScrapyItem()
                item['title'] = title
                items.append(item)
            return items

Maressa answered 29/11, 2013 at 15:55 Comment(4)

Could you include the log messages of a run where you don't get the response? – Heaviness 30/11, 2013 at 5:46

Hello. Do you have any new information about it? I have similar issue #20723871 – Monniemono 21/12, 2013 at 20:47

What I did was check if the titles are empty. If leeks are empty again request to the same link that I take from respinse.url . Pretty dumb solution, but it works. – Maressa 3/1, 2014 at 11:36

@Maressa would you consider adding a response that shortly describes the solution you chose? – Hydrocephalus 7/2, 2019 at 15:6

I believe you are just not using the most adequate XPath expression.

Amazon's HTML is kinda messy, not very uniform and therefore not very easy to parse. But after some experimenting I could extract all the 12 titles of a couple of search results with the following parse function:

def parse(self, response):
    sel = Selector(response)
    p = sel.xpath('//div[@class="data"]/h3/a')
    titles = p.xpath('span/text()').extract() + p.xpath('text()').extract()
    items = []
    for title in titles:
        item = AmazonScrapyItem()
        item['title'] = title
        items.append(item)
    return items

If you care about the actual order of the results the above code might not be appropriate but I believe that is not the case.

Hanako answered 31/1, 2014 at 11:57 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags