Using Scrapy with amazon S3 is fairly simple, you set:
- FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl'
- FEED_FORMAT = 'jsonlines'
- AWS_ACCESS_KEY_ID = [access key]
- AWS_SECRET_ACCESS_KEY = [secret key]
and everything works just fine.
But Scrapyd seems to override that setting and saves the items on the server (with a link in the web site)
Adding the "items_dir =" setting doesn't seem to change anything.
What kind of setting makes it work?
EDIT: Extra info that might be relevant - we are using Scrapy-Heroku.
scrapy crawl
? How did you tell scrapyd where your projectsettings
file is? – Tenantscrapy crawl
(This tells me the S3 configuration is fine) and I just put the settings in the default location (I know it reads it well because I have theapplication
setting there - which works fine) – Taal