I am going through the scrapy tutorial http://doc.scrapy.org/en/latest/intro/tutorial.html and I followed it till I ran this command
scrapy crawl dmoz
And it gave me output with an error
2013-08-25 13:11:42-0700 [scrapy] INFO: Scrapy 0.18.0 started (bot: tutorial)
2013-08-25 13:11:42-0700 [scrapy] DEBUG: Optional features available: ssl, http11
2013-08-25 13:11:42-0700 [scrapy] DEBUG: Overridden settings: {'NEWSPIDER_MODULE': 'tutorial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'}
2013-08-25 13:11:42-0700 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 88, in _run_print_help
func(*a, **kw)
File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/Library/Python/2.7/site-packages/scrapy/commands/crawl.py", line 46, in run
spider = self.crawler.spiders.create(spname, **opts.spargs)
File "/Library/Python/2.7/site-packages/scrapy/command.py", line 34, in crawler
self._crawler.configure()
File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 44, in configure
self.engine = ExecutionEngine(self, self._spider_closed)
File "/Library/Python/2.7/site-packages/scrapy/core/engine.py", line 62, in __init__
self.downloader = Downloader(crawler)
File "/Library/Python/2.7/site-packages/scrapy/core/downloader/__init__.py", line 73, in __init__
self.handlers = DownloadHandlers(crawler)
File "/Library/Python/2.7/site-packages/scrapy/core/downloader/handlers/__init__.py", line 18, in __init__
cls = load_object(clspath)
File "/Library/Python/2.7/site-packages/scrapy/utils/misc.py", line 38, in load_object
mod = __import__(module, {}, {}, [''])
File "/Library/Python/2.7/site-packages/scrapy/core/downloader/handlers/s3.py", line 4, in <module>
from .http import HTTPDownloadHandler
File "/Library/Python/2.7/site-packages/scrapy/core/downloader/handlers/http.py", line 5, in <module>
from .http11 import HTTP11DownloadHandler as HTTPDownloadHandler
File "/Library/Python/2.7/site-packages/scrapy/core/downloader/handlers/http11.py", line 13, in <module>
from scrapy.xlib.tx import Agent, ProxyAgent, ResponseDone, \
File "/Library/Python/2.7/site-packages/scrapy/xlib/tx/__init__.py", line 6, in <module>
from . import client, endpoints
File "/Library/Python/2.7/site-packages/scrapy/xlib/tx/client.py", line 37, in <module>
from .endpoints import TCP4ClientEndpoint, SSL4ClientEndpoint
File "/Library/Python/2.7/site-packages/scrapy/xlib/tx/endpoints.py", line 222, in <module>
interfaces.IProcessTransport, '_process')):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/zope/interface/declarations.py", line 495, in __call__
raise TypeError("Can't use implementer with classes. Use one of "
TypeError: Can't use implementer with classes. Use one of the class-declaration functions instead.
I am not very familiar with python and I am not sure what is it complaining about
here is my domz_spider.py file
from scrapy.spider import BaseSpider
class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)
And here is my items file
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
from scrapy.item import Item, Field
class DmozItem(Item):
title = Field()
link = Field()
desc = Field()
and here is the directory structure
scrapy.cfg
tutorial/
tutorial/items.py
tutorial/pipelines.py
tutorial/settings.py
tutorial/spiders/
tutorial/spiders/domz_spider.py
here is the settings.py file
BOT_NAME = 'tutorial'
SPIDER_MODULES = ['tutorial.spiders']
NEWSPIDER_MODULE = 'tutorial.spiders'