Rows Rotation For Mysql Table Consumer Process in Round Robin
Asked Answered
I

2

8

We have automation to login into different websites and perform some operations on website accounts. It is required that only one BOT can login simultaneously into a specific account to avoid inconsistency in automation flow.

We have a table in which we have stored website credentials:

|---------------------|------------------|------------------------|------------------------|
|         id          |       email      |        locked_at       |    last_fetched_at     |
|---------------------|------------------|------------------------|------------------------|
|          1          |   [email protected]  |2020-09-14 14:35:35 UTC |2020-09-14 14:35:35 UTC |
|---------------------|------------------|------------------------|------------------------|
|          2          |   [email protected]  |           NULL         |           NULL         |
|---------------------|------------------|------------------------|------------------------|
|          3          |   [email protected]  |2020-09-14 14:35:35 UTC |2020-09-14 14:35:35 UTC |
|---------------------|------------------|------------------------|------------------------|
|          4          |   [email protected]  |           NULL         |           NULL         |
|---------------------|------------------|------------------------|------------------------|

We fetch credentials with this query to be precise:

SELECT  `credentials`.* FROM `credentials` WHERE `credentials`.`locked_at` IS NULL ORDER BY last_fetched_at asc LIMIT 1

Then we update the locked_at field with current time to lock the credential row for next process.

This is happening in a node.js app with mysql as backend db and being accessed by multiple BOT processes concurrently. We want to make sure two process don't get same credentials & used transactions/ select for update to make this operation atomic but no good successful way/query to do so so far.

We are open to any third party integration like redis or is there is something in node which we can use to achieve this.

Thank you for your time.

Inaudible answered 7/10, 2020 at 20:27 Comment(6)
What is credentials:bots ratio? As I understand the same free credentials may be reused by different bots many times?Sopping
You must create and update additional column like bot_connection with CONNECTION_ID() of the process which reserves the row. Bot tries to update the row with its own connection ID then checks does it was successful. If a row with its ID is present yet then the row is successfully reserved, if not then some another bot re-reserves this row concurrently, and current bot must try to reserve another row. locked_at column must be updated too - this will allow some service procedure to find and reset the rows if something happens with the bot which reserves it (hangs, for example).Brainstorming
@AndreiKovrov yes, the process which obtained the credentials, will also unlock them but that's not an issue because only one process who has ownership of the row will be running unlock query which is always 1 process with 1 row.Inaudible
We can use for update with select statement for row level lockingTrenton
I think what @HassanAkram is trying to say is there's some kind of race condition so if the two requests access the table at the same time, they would get same credentialsTrenton
Is this any help? https://mcmap.net/q/225181/-output-clause-in-mysqlCaucus
K
0

Since you are open to use third party integrations, Redis is a good choice for distributed locking.

You can achieve the desired behaviour by making an entry related to each website in the redis.

setnx command is useful in locking,as it returns 0 if the value is already set.

Whenever a bot is process on a website, use redis command
setnx WEBSITE_NAME "BOTID"
with some expiry value.
You can set an expiry value of twice the average time it takes to complete processing a website.

If the response for the command is 1, i.e website is open for processing. Response 0 indicates that a bot is processing the website.

Upon completion of processing, BOT should remove the entry in redis.

Setting expiry removes the lock automatically,even in the event of bot failing to remove the lock from Redis.

Redis is single threaded in server side and executes the commands sequentially, hence it takes care of the concurrency.

Kostroma answered 11/11, 2021 at 14:21 Comment(0)
A
-1

The challenge here is to handle the various exceptions that will disrupt the expected flow and how to recover from them. In order to design the actual solution, you need to take into account average process times, how many bots working on how many websites and the severity of failure and whether it can be fixed as a side process. If the websites are within your control (not 3rd party sites), I would prefer to use instead a messaging (pub-sub) type of solution where your infrastructure notifies an agent on the website to handle the update, and the same agent ensures that only a single update is taking place at one time (per your requirement).

If this type of setup is not possible, then your next bet is to use something like what @Akina is suggesting, but also come up with a recovery action for every pitfall that might happen, including handling race conditions, bots timing out or returning incomplete tasks, websites returning unexpected responses, etc. This might get a bit tiring after a while if someone does not keep an eye on the process and adjust it to handle every unexpected surprise you're bound to see over the long term.

Arlenarlena answered 12/10, 2020 at 20:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.