Saving items from Scrapyd to Amazon S3 using Feed Exporter
Asked Answered
T

2

5

Using Scrapy with amazon S3 is fairly simple, you set:

  • FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl'
  • FEED_FORMAT = 'jsonlines'
  • AWS_ACCESS_KEY_ID = [access key]
  • AWS_SECRET_ACCESS_KEY = [secret key]

and everything works just fine.

But Scrapyd seems to override that setting and saves the items on the server (with a link in the web site)

Adding the "items_dir =" setting doesn't seem to change anything.

What kind of setting makes it work?

EDIT: Extra info that might be relevant - we are using Scrapy-Heroku.

Taal answered 11/4, 2013 at 18:3 Comment(2)
do you see anything in scrapyd logs? Does it save items on S3 if you run your crawler directly via scrapy crawl? How did you tell scrapyd where your project settings file is?Tenant
Nothing in the logs as far as I can see. It does save to S3 when I do scrapy crawl (This tells me the S3 configuration is fine) and I just put the settings in the default location (I know it reads it well because I have the application setting there - which works fine)Taal
P
1

I also faced the same problem. Removing the items_dir= from scrapyd.conf file worked for me.

Popele answered 16/5, 2018 at 6:6 Comment(0)
W
0

You can set the items_dir property to an empty value like this:

[scrapyd]
items_dir=

It seems that when that property is set, takes precedence over the configured exported. See http://scrapyd.readthedocs.org/en/latest/config.html for more information.

Whitelivered answered 20/9, 2013 at 20:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.