Downloading images in scrapy
Asked Answered
C

1

7

I am trying to download image in via scrapy. Here are my different files :

items.py

class DmozItem(Item):
        title = Field()
        image_urls = Field()
        images = Field() 

settings.py

BOT_NAME = 'tutorial'

SPIDER_MODULES = ['tutorial.spiders']
NEWSPIDER_MODULE = 'tutorial.spiders'
ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
IMAGES= '/home/mayank/Desktop/sc/tutorial/tutorial'

spider

class DmozSpider(BaseSpider):
    name = "wikipedia"
    allowed_domains = ["wikipedia.org"]
    start_urls = [
        "http://en.wikipedia.org/wiki/Pune"
    ]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        items = []
        images=hxs.select('//a[@class="image"]')
        for image in images:
                item = DmozItem()
                link=image.select('@href').extract()[0]
                link = 'http://en.wikipedia.com'+link
                item['image_urls']=link
                items.append(item)

In spite of all these setting I my pipeline is not getting activated.Please help. I am new to this framework.

Cyler answered 16/4, 2013 at 18:39 Comment(2)
Have you installed PIL (Python Imaging Library)? It's a prerequisite for image downloads: doc.scrapy.org/en/latest/topics/images.htmlAutobahn
How do you know the pipeline is not being activated? Can you include a bit of the log output, something like: 2013-04-16 16:40:31-0500 [scrapy] DEBUG: Enabled item pipelines: ImagesPipeline.Broomrape
T
10

First, settings.py: IMAGES -> IMAGES_STORE

Second, spider: You should return an item so that ImagesPipeline could download those images.

item = DmozItem()
image_urls = hxs.select('//img/@src').extract()
item['image_urls'] = ["http:" + x for x in image_urls]
return item
Tolbutamide answered 17/4, 2013 at 12:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.