Scrapyd jobid value inside spider
Asked Answered
B

3

11

Framework Scrapy - Scrapyd server.

I have some problem with getting jobid value inside the spider.

After post data to http://localhost:6800/schedule.json the response is

status = ok
jobid = bc2096406b3011e1a2d0005056c00008

But I need use this jobid inside the current spider during the process. It can be used for open {jobid}.log file or other dynamic reasons.

class SomeSpider(BaseSpider):
    name = "some"
    start_urls = ["http://www.example.com/"]
    def parse(self, response):
        items = []
        for val in values:
            item = SomeItem()
            item['jobid'] = self.jobid # ???!
            items.append(item)
        return items

But I see this jobid only after the task is finihed :( Thanks!

Basinger answered 11/3, 2012 at 4:28 Comment(0)
G
6

I guess there is an easier way, but you can extract job id from command line args. IIRC, scrapyd launches a spider giving it a jobid in parameters. Just explore sys.args where you need jobid.

Goa answered 11/3, 2012 at 13:48 Comment(6)
All genius is easy ;) Thanks, mate! Some example: if (len(sys.argv)>2): if ('_job' in sys.argv[3]): self.jobid = sys.argv[3].rsplit('=')Basinger
@Maxim, glad it worked. Please, don't forget to accept and upvote answers that worked for you.Goa
It requeres 15 points of reputation. I'll back to this post after some growth ;) Thank you.Basinger
You can also get it from the SCRAPY_JOB environment variable: os.environ['SCRAPY_JOB']Monochromatic
@PabloHoffman what happens if we have multiple schedules running? I get the jobids but I am not sure if they will be correct every time.Hutcherson
This is now os.environ['SHUB_JOBKEY']Bisson
B
10

You can get it from the SCRAPY_JOB environment variable:

os.environ['SCRAPY_JOB']
Boutin answered 8/1, 2015 at 6:32 Comment(1)
This is now os.environ['SHUB_JOBKEY']Bisson
G
6

I guess there is an easier way, but you can extract job id from command line args. IIRC, scrapyd launches a spider giving it a jobid in parameters. Just explore sys.args where you need jobid.

Goa answered 11/3, 2012 at 13:48 Comment(6)
All genius is easy ;) Thanks, mate! Some example: if (len(sys.argv)>2): if ('_job' in sys.argv[3]): self.jobid = sys.argv[3].rsplit('=')Basinger
@Maxim, glad it worked. Please, don't forget to accept and upvote answers that worked for you.Goa
It requeres 15 points of reputation. I'll back to this post after some growth ;) Thank you.Basinger
You can also get it from the SCRAPY_JOB environment variable: os.environ['SCRAPY_JOB']Monochromatic
@PabloHoffman what happens if we have multiple schedules running? I get the jobids but I am not sure if they will be correct every time.Hutcherson
This is now os.environ['SHUB_JOBKEY']Bisson
D
1

In the spider.py -->

class SomeSpider(BaseSpider):
    name = "some"
    start_urls = ["http://www.example.com/"]

    def __init__(self, *args, **kwargs):
        super(SomeSpider, self).__init__(*args, **kwargs)
        self.jobid = kwargs.get('_job')

    def parse(self, response):
        items = []
        for val in values:
           item = SomeItem()
           item['jobid'] = self.jobid # ???!
           items.append(item)
        return items
Daytime answered 12/4, 2021 at 8:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.