I currently have a scheduled console command that runs every 5 minutes without overlap like this:
$schedule->command('crawler')
->everyFiveMinutes()
->withoutOverlapping()
->sendOutputTo('../_laravel/storage/logs/scheduler-log.txt');
So it works great, but I currently have about 220 pages that takes about 3 hours to finish in increments of 5 minutes because I just force it to crawl 10 pages at each interval since each page takes like 20-30 seconds to crawl due to various factors. Each page is a record in the database. If I end up having 10,000 pages to crawl, this method would not work because it would take more than 24 hours and each page is supposed to be re-crawled once a day.
So my vendor allows up to 10 concurrent requests (or more with higher plans), so what's the best way to run it concurrently? If I just duplicate the scheduler code, does it run the same command twice or like 10 times if I duplicated it 10 times? Any issues that would cause?
And then I need to pass on parameters to the console such as 1, 2, 3, etc... in which I could use to determine which pages to crawl? i.e. 1 would be 1-10 records, 2 would be next 11-20 records, and so on.
Using this StackOverfow answer, I think I know how to pass it along, like this:
$schedule->command('crawler --sequence=1')
But how do I read that parameter within my Command
class? Does it just become a regular PHP variable, i.e. $sequence
?
Command
class. – Diatonic