scrapyd Questions
3
Solved
Framework Scrapy - Scrapyd server.
I have some problem with getting jobid value inside the spider.
After post data to http://localhost:6800/schedule.json the response is
status = ok
jobid = bc2...
2
Solved
I searched a lot on this, it may have a simple solution that I am missing.
I have setup scrapy + scrapyd on both my local machine and my server. They work both ok when I try as "scrapyd".
I can d...
2
Solved
I have created a couple of web spiders that I intend to run simultaneously with scrapyd. I first successfully installed scrapyd in Ubuntu 14.04 using the command:
pip install scrapyd, and when I r...
Medicine asked 14/7, 2015 at 5:31
2
Using Scrapy with amazon S3 is fairly simple, you set:
FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl'
FEED_FORMAT = 'jsonlines'
AWS_ACCESS_KEY_ID = [access key]
AWS_SECRET_ACCESS_KEY = [s...
5
Solved
How can I get the request url in Scrapy's parse() function? I have a lot of urls in start_urls and some of them redirect my spider to homepage and as result I have an empty item. So I need somethin...
Waterer asked 19/11, 2013 at 20:7
4
Solved
I am trying to run Scrapyd on a virtual Ubuntu 16.04 server, to which I connect via SSH. When I run scrapy by simply running
$ scrapyd
I can connect to the web interface by going to http://82.16...
1
What tool or set of tools would you use for horizontally scaling scrapyd adding new machines to a scrapyd cluster dynamically and having N instances per machine if required. Is not neccesary for al...
Complicacy asked 24/7, 2015 at 18:39
1
Solved
I'd just installed the scrapyd-client(1.1.0) in a virtualenv, and run command 'scrapyd-deploy' successfully, but when I run 'scrapyd-client', the terminal said: command not found: scrapyd-client.
...
Sharitasharity asked 18/8, 2017 at 7:19
3
Scrapy is pretty cool, however I found the documentation to very bare bones, and some simple questions were tough to answer. After putting together various techniques from various stackoverflows I ...
1
Solved
We've been using Scrapyd service for a while up until now. It provides a nice wrapper around a scrapy project and its spiders letting to control the spiders via an HTTP API:
Scrapyd is a service...
Absently asked 17/5, 2016 at 18:16
4
Solved
I had multiple spiders in my project folder and want to run all the spiders at once, so i decided to run them using scrapyd service.
I have started doing this by seeing here
First of all i am in ...
0
I have an application which schedules scrapy crawl jobs via scrapyd.
Items flow nicely to the DB, and I can monior the job status via the listjobs.json endpoint.So far so good, and I can even know ...
4
I've written a working crawler using scrapy,
now I want to control it through a Django webapp, that is to say:
Set 1 or several start_urls
Set 1 or several allowed_domains
Set settings values
St...
1
I can run a spider in scrapy with a simple command
scrapy crawl custom_spider -a input_val=5 -a input_val2=6
where input_val and input_val2 are the values i'm passing to the spider
and the abov...
1
My spider have a serious memory leak.. After 15 min of run its memory 5gb and scrapy tells (using prefs() ) that there 900k requests objects and thats all. What can be the reason for this high numb...
Marx asked 23/7, 2015 at 17:19
2
Solved
Context
I am running scrapyd 1.1 + scrapy 0.24.6 with a single "selenium-scrapy hybrid" spider that crawls over many domains according to parameters.
The development machine that host scrapyd's i...
2
How could I pass username and password from the command line? Thanks!
class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']
def parse(self, re...
Epiclesis asked 13/1, 2014 at 20:55
2
Solved
I'm using scrapy for a project where I want to scrape a number of sites - possibly hundreds - and I have to write a specific spider for each site. I can schedule one spider in a project deployed to...
Herstein asked 29/5, 2012 at 14:23
1
Solved
Hey so I have about 50 spiders in my project and I'm currently running them via scrapyd server. I'm running into an issue where some of the resources I use get locked and make my spiders fail or go...
Brophy asked 25/7, 2014 at 16:27
1
Solved
I would like to know how to ignore items that don't fill all fields, some kind of droping, because in the output of scrapyd I'm getting pages that don't fill all fields.
I have that code:
class P...
1
Solved
I am new to scrapy and scrapyd. Did some reading and developed my crawler which crawls a news website and gives me all the news articles from it. If I run the crawler simply by
scrapy crawl proje...
1
Solved
I am using Scrapy framework to make spiders crawl through some webpages. Basically, what I want is to scrape web pages and save them to database. I have one spider per webpage. But I am having trou...
2
Solved
Can somebody please guide me with step-by-step procedure on how to eggfy my existing python project? The documentation is keep mentioning something about setup.py within a package but I cannot find...
1
I am going through the scrapy tutorial http://doc.scrapy.org/en/latest/intro/tutorial.html
and I followed it till I ran this command
scrapy crawl dmoz
And it gave me output with an error
2013-0...
1
Solved
The scrapy doc says that:
Scrapy comes with a built-in service, called “Scrapyd”, which allows you to deploy (aka. upload) your projects and control their spiders using a JSON web service.
is ...
1 Next >
© 2022 - 2024 — McMap. All rights reserved.