scrapyd Questions

3

Solved

Framework Scrapy - Scrapyd server. I have some problem with getting jobid value inside the spider. After post data to http://localhost:6800/schedule.json the response is status = ok jobid = bc2...
Basinger asked 11/3, 2012 at 4:28

2

Solved

I searched a lot on this, it may have a simple solution that I am missing. I have setup scrapy + scrapyd on both my local machine and my server. They work both ok when I try as "scrapyd". I can d...
Hole asked 15/7, 2017 at 19:38

2

Solved

I have created a couple of web spiders that I intend to run simultaneously with scrapyd. I first successfully installed scrapyd in Ubuntu 14.04 using the command: pip install scrapyd, and when I r...
Medicine asked 14/7, 2015 at 5:31

2

Using Scrapy with amazon S3 is fairly simple, you set: FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl' FEED_FORMAT = 'jsonlines' AWS_ACCESS_KEY_ID = [access key] AWS_SECRET_ACCESS_KEY = [s...
Taal asked 11/4, 2013 at 18:3

5

Solved

How can I get the request url in Scrapy's parse() function? I have a lot of urls in start_urls and some of them redirect my spider to homepage and as result I have an empty item. So I need somethin...
Waterer asked 19/11, 2013 at 20:7

4

Solved

I am trying to run Scrapyd on a virtual Ubuntu 16.04 server, to which I connect via SSH. When I run scrapy by simply running $ scrapyd I can connect to the web interface by going to http://82.16...
Progression asked 1/11, 2017 at 23:27

1

What tool or set of tools would you use for horizontally scaling scrapyd adding new machines to a scrapyd cluster dynamically and having N instances per machine if required. Is not neccesary for al...
Complicacy asked 24/7, 2015 at 18:39

1

Solved

I'd just installed the scrapyd-client(1.1.0) in a virtualenv, and run command 'scrapyd-deploy' successfully, but when I run 'scrapyd-client', the terminal said: command not found: scrapyd-client. ...
Sharitasharity asked 18/8, 2017 at 7:19

3

Scrapy is pretty cool, however I found the documentation to very bare bones, and some simple questions were tough to answer. After putting together various techniques from various stackoverflows I ...
Jam asked 25/1, 2014 at 0:47

1

Solved

We've been using Scrapyd service for a while up until now. It provides a nice wrapper around a scrapy project and its spiders letting to control the spiders via an HTTP API: Scrapyd is a service...
Absently asked 17/5, 2016 at 18:16

4

Solved

I had multiple spiders in my project folder and want to run all the spiders at once, so i decided to run them using scrapyd service. I have started doing this by seeing here First of all i am in ...
Sweatbox asked 6/7, 2012 at 12:48

0

I have an application which schedules scrapy crawl jobs via scrapyd. Items flow nicely to the DB, and I can monior the job status via the listjobs.json endpoint.So far so good, and I can even know ...
Doublespace asked 3/3, 2016 at 16:44

4

I've written a working crawler using scrapy, now I want to control it through a Django webapp, that is to say: Set 1 or several start_urls Set 1 or several allowed_domains Set settings values St...
Headdress asked 21/10, 2012 at 10:10

1

I can run a spider in scrapy with a simple command scrapy crawl custom_spider -a input_val=5 -a input_val2=6 where input_val and input_val2 are the values i'm passing to the spider and the abov...
Anglaangle asked 26/8, 2015 at 10:20

1

My spider have a serious memory leak.. After 15 min of run its memory 5gb and scrapy tells (using prefs() ) that there 900k requests objects and thats all. What can be the reason for this high numb...
Marx asked 23/7, 2015 at 17:19

2

Solved

Context I am running scrapyd 1.1 + scrapy 0.24.6 with a single "selenium-scrapy hybrid" spider that crawls over many domains according to parameters. The development machine that host scrapyd's i...
Christ asked 5/6, 2015 at 17:56

2

How could I pass username and password from the command line? Thanks! class LoginSpider(Spider): name = 'example.com' start_urls = ['http://www.example.com/users/login.php'] def parse(self, re...
Epiclesis asked 13/1, 2014 at 20:55

2

Solved

I'm using scrapy for a project where I want to scrape a number of sites - possibly hundreds - and I have to write a specific spider for each site. I can schedule one spider in a project deployed to...
Herstein asked 29/5, 2012 at 14:23

1

Solved

Hey so I have about 50 spiders in my project and I'm currently running them via scrapyd server. I'm running into an issue where some of the resources I use get locked and make my spiders fail or go...
Brophy asked 25/7, 2014 at 16:27

1

Solved

I would like to know how to ignore items that don't fill all fields, some kind of droping, because in the output of scrapyd I'm getting pages that don't fill all fields. I have that code: class P...
Grassquit asked 22/5, 2014 at 15:7

1

Solved

I am new to scrapy and scrapyd. Did some reading and developed my crawler which crawls a news website and gives me all the news articles from it. If I run the crawler simply by scrapy crawl proje...
Intersexual asked 11/2, 2014 at 5:43

1

Solved

I am using Scrapy framework to make spiders crawl through some webpages. Basically, what I want is to scrape web pages and save them to database. I have one spider per webpage. But I am having trou...
Navar asked 11/2, 2014 at 6:7

2

Solved

Can somebody please guide me with step-by-step procedure on how to eggfy my existing python project? The documentation is keep mentioning something about setup.py within a package but I cannot find...
Jansson asked 9/12, 2013 at 0:30

1

I am going through the scrapy tutorial http://doc.scrapy.org/en/latest/intro/tutorial.html and I followed it till I ran this command scrapy crawl dmoz And it gave me output with an error 2013-0...
Escutcheon asked 25/8, 2013 at 20:16

1

Solved

The scrapy doc says that: Scrapy comes with a built-in service, called “Scrapyd”, which allows you to deploy (aka. upload) your projects and control their spiders using a JSON web service. is ...
Natie asked 16/4, 2013 at 10:19

© 2022 - 2024 — McMap. All rights reserved.