Saving items from Scrapyd to Amazon S3 using Feed Exporter

Asked 11/4, 2013 at 18:3 Answered 16/5, 2018 at 6:6

amazon-s3 scrapy scrapyd

Using Scrapy with amazon S3 is fairly simple, you set:

FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl'
FEED_FORMAT = 'jsonlines'
AWS_ACCESS_KEY_ID = [access key]
AWS_SECRET_ACCESS_KEY = [secret key]

and everything works just fine.

But Scrapyd seems to override that setting and saves the items on the server (with a link in the web site)

Adding the "items_dir =" setting doesn't seem to change anything.

What kind of setting makes it work?

EDIT: Extra info that might be relevant - we are using Scrapy-Heroku.

Taal answered 11/4, 2013 at 18:3 Comment(2)

do you see anything in scrapyd logs? Does it save items on S3 if you run your crawler directly via scrapy crawl? How did you tell scrapyd where your project settings file is? – Tenant 13/4, 2013 at 19:24

Nothing in the logs as far as I can see. It does save to S3 when I do scrapy crawl (This tells me the S3 configuration is fine) and I just put the settings in the default location (I know it reads it well because I have the application setting there - which works fine) – Taal 14/4, 2013 at 7:12

I also faced the same problem. Removing the items_dir= from scrapyd.conf file worked for me.

Popele answered 16/5, 2018 at 6:6 Comment(0)

You can set the items_dir property to an empty value like this:

[scrapyd]
items_dir=

It seems that when that property is set, takes precedence over the configured exported. See http://scrapyd.readthedocs.org/en/latest/config.html for more information.

Whitelivered answered 20/9, 2013 at 20:32 Comment(0)

Recommended topics

Hot tags