I have a databases table with ~50K rows in it, each row represents a job that need to be done. I have a program that extracts a job from the DB, does the job and puts the result back in the db. (this system is running right now)
Now I want to allow more than one processing task to do jobs but be sure that no task is done twice (as a performance concern not that this will cause other problems). Because the access is by way of a stored procedure, my current though is to replace said stored procedure with something that looks something like this
update tbl
set owner = connection_id()
where available and owner is null limit 1;
select stuff
from tbl
where owner = connection_id();
BTW; worker's tasks might drop there connection between getting a job and submitting the results. Also, I don't expect the DB to even come close to being the bottle neck unless I mess that part up (~5 jobs per minute)
Are there any issues with this? Is there a better way to do this?
Note: the "Database as an IPC anti-pattern" is only slightly apropos here because
- I'm not doing IPC (there is no process generating the rows, they all already exist right now) and
- the primary gripe described for that anti-pattern is that it results in unneeded load on the DB as processes wait for messages (in my case, if there are no messages, everything can shutdown as everything is done)