MySQL: Transactions vs Locking Tables
Asked Answered
A

7

144

I'm a bit confused with transactions vs locking tables to ensure database integrity and make sure a SELECT and UPDATE remain in sync and no other connection interferes with it. I need to:

SELECT * FROM table WHERE (...) LIMIT 1

if (condition passes) {
   // Update row I got from the select 
   UPDATE table SET column = "value" WHERE (...)

   ... other logic (including INSERT some data) ...
}

I need to ensure that no other queries will interfere and perform the same SELECT (reading the 'old value' before that connection finishes updating the row.

I know I can default to LOCK TABLES table to just make sure that only 1 connection is doing this at a time, and unlock it when I'm done, but that seems like overkill. Would wrapping that in a transaction do the same thing (ensuring no other connection attempts the same process while another is still processing)? Or would a SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE be better?

Alexandrina answered 19/11, 2010 at 16:0 Comment(0)
P
234

Locking tables prevents other DB users from affecting the rows/tables you've locked. But locks, in and of themselves, will NOT ensure that your logic comes out in a consistent state.

Think of a banking system. When you pay a bill online, there's at least two accounts affected by the transaction: Your account, from which the money is taken. And the receiver's account, into which the money is transferred. And the bank's account, into which they'll happily deposit all the service fees charged on the transaction. Given (as everyone knows these days) that banks are extraordinarily stupid, let's say their system works like this:

$balance = "GET BALANCE FROM your ACCOUNT";
if ($balance < $amount_being_paid) {
    charge_huge_overdraft_fees();
}
$balance = $balance - $amount_being paid;
UPDATE your ACCOUNT SET BALANCE = $balance;

$balance = "GET BALANCE FROM receiver ACCOUNT"
charge_insane_transaction_fee();
$balance = $balance + $amount_being_paid
UPDATE receiver ACCOUNT SET BALANCE = $balance

Now, with no locks and no transactions, this system is vulnerable to various race conditions, the biggest of which is multiple payments being performed on your account, or the receiver's account in parallel. While your code has your balance retrieved and is doing the huge_overdraft_fees() and whatnot, it's entirely possible that some other payment will be running the same type of code in parallel. They'll be retrieve your balance (say, $100), do their transactions (take out the $20 you're paying, and the $30 they're screwing you over with), and now both code paths have two different balances: $80 and $70. Depending on which ones finishes last, you'll end up with either of those two balances in your account, instead of the $50 you should have ended up with ($100 - $20 - $30). In this case, "bank error in your favor".

Now, let's say you use locks. Your bill payment ($20) hits the pipe first, so it wins and locks your account record. Now you've got exclusive use, and can deduct the $20 from the balance, and write the new balance back in peace... and your account ends up with $80 as is expected. But... uhoh... You try to go update the receiver's account, and it's locked, and locked longer than the code allows, timing out your transaction... We're dealing with stupid banks, so instead of having proper error handling, the code just pulls an exit(), and your $20 vanishes into a puff of electrons. Now you're out $20, and you still owe $20 to the receiver, and your telephone gets repossessed.

So... enter transactions. You start a transaction, you debit your account $20, you try to credit the receiver with $20... and something blows up again. But this time, instead of exit(), the code can just do rollback, and poof, your $20 is magically added back to your account.

In the end, it boils down to this:

Locks keep anyone else from interfering with any database records you're dealing with. Transactions keep any "later" errors from interfering with "earlier" things you've done. Neither alone can guarantee that things work out ok in the end. But together, they do.

in tomorrow's lesson: The Joy of Deadlocks.

Prunella answered 19/11, 2010 at 18:0 Comment(10)
I'm also/still confused. Say the receiver account had $100 in it to start and we are adding the $20 bill payment from our account. My understanding of transactions is that when they start, any in-transaction operation sees the database in the state it was at the beginning of the transaction. ie: until we change it, the receiver account has $100. So... when we add $20 we actually set a balance of $120. But what happens if, during our transaction, someone drained the receiver account to $0? Is this prevented somehow? Do they magically get $120 again? Is this why locks are needed as well?Bystreet
Yes, that's where locks come into play. A proper system would write-lock the record so that no one else could update the record while the transaction is progressing. A paranoid system would put an unconditional lock on the record so that no one could read the "stale" balance either.Prunella
Basically look at transactions as securing things inside your code path. Locks secure things across "parallel" code paths. Until deadlocks hit...Prunella
@MarcB, So why do we have to do locking explicitly if using transactions alone already guarantee that the locks are in place? Will there even be a case whereby we must do explicit locking because transactions alone are insufficient?Cultured
This answer is not correct and may lead to wrong conclusions. This statement: "Locks keep anyone else from interfering with any database records you're dealing with. Transactions keep any "later" errors from interfering with "earlier" things you've done. Neither alone can guarantee that things work out ok in the end. But together, they do." - would get you fired, it is extremely wrong and stupid See articles: en.wikipedia.org/wiki/ACID, en.wikipedia.org/wiki/Isolation_(database_systems) and dev.mysql.com/doc/refman/5.1/en/…Committeeman
Nice explanations .. Just I don't know how can I understand whether my table is transactions or locking? I'm using MySQL and InnoDB engine .. How can I understand my table is transactions or locking ?Bluestone
So if I just go to a cash machine and take 20$, (this means GET BALANCE, $balance = $balance -20, UPDATE BALANCE), then a LOCK would be sufficient and no transaction is needed, right?Hobo
oh geez, this is wrong on so many levels and still being upvoted... someone HAVING spare_time > 0 should do something hereSabatier
The nightmare for some programmers like me; looked at transaction, found them not to be effective alone; locks, quite dreadful to hold an entire table frozen. While considering SELECT FOR UPDATE ..., I had to implement my own custom checks.Slaveholder
what if lock does not have a timeout (or really long one), so there is no need for transactions?Interdental
D
28

I've started to research the same topic for the same reasons as you indicated in your question. I was confused by the answers given in SO due to them being partial answers and not providing the big picture. After I read couple documentation pages from different RDMS providers these are my takes:

TRANSACTIONS

Statements are database commands mainly to read and modify the data in the database. Transactions are scope of single or multiple statement executions. They provide two things:

  1. A mechanism which guaranties that all statements in a transaction are executed correctly or in case of a single error any data modified by those statements will be reverted to its last correct state (i.e. rollback). What this mechanism provides is called atomicity.
  2. A mechanism which guaranties that concurrent read statements can view the data without the occurrence of some or all phenomena described below.

Dirty read: A transaction reads data written by a concurrent uncommitted transaction.

Nonrepeatable read: A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).

Phantom read: A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.

Serialization anomaly: The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.

What this mechanism provides is called isolation and the mechanism which lets the statements to chose which phenomena should not occur in a transaction is called isolation levels.

As an example this is the isolation-level / phenomena table for PostgreSQL: enter image description here

If any of the described promises is broken by the database system, changes are rolled back and the caller notified about it.

How these mechanisms are implemented to provide these guaranties is described below.

LOCK TYPES

  1. Exclusive Locks: When an exclusive lock acquired over a resource no other exclusive lock can be acquired over that resource. Exclusive locks are always acquired before a modify statement (INSERT, UPDATE or DELETE) and they are released after the transaction is finished. To explicitly acquire exclusive locks before a modify statement you can use hints like FOR UPDATE(PostgreSQL, MySQL) or UPDLOCK (T-SQL).
  2. Shared Locks: Multiple shared locks can be acquired over a resource. However, shared locks and exclusive locks can not be acquired at the same time over a resource. Shared locks might or might not be acquired before a read statement (SELECT, JOIN) based on database implementation of isolation levels.

LOCK RESOURCE RANGES

  1. Row: single row the statements executes on.
  2. Range: a specific range based on the condition given in the statement (SELECT ... WHERE).
  3. Table: whole table. (Mostly used to prevent deadlocks on big statements like batch update.)

As an example the default shared lock behavior of different isolation levels for SQL-Server : enter image description here

DEADLOCKS

One of the downsides of locking mechanism is deadlocks. A deadlock occurs when a statement enters a waiting state because a requested resource is held by another waiting statement, which in turn is waiting for another resource held by another waiting statement. In such case database system detects the deadlock and terminates one of the transactions. Careless use of locks can increase the chance of deadlocks however they can occur even without human error.

SNAPSHOTS (DATA VERSIONING)

This is a isolation mechanism which provides to a statement a copy of the data taken at a specific time.

  1. Statement beginning: provides data copy to the statement taken at the beginning of the statement execution. It also helps for the rollback mechanism by keeping this data until transaction is finished.

  2. Transaction beginning: provides data copy to the statement taken at the beginning of the transaction.

All of those mechanisms together provide consistency.

When it comes to Optimistic and Pessimistic locks, they are just namings for the classification of approaches to concurrency problem.

Pessimistic concurrency control:

A system of locks prevents users from modifying data in a way that affects other users. After a user performs an action that causes a lock to be applied, other users cannot perform actions that would conflict with the lock until the owner releases it. This is called pessimistic control because it is mainly used in environments where there is high contention for data, where the cost of protecting data with locks is less than the cost of rolling back transactions if concurrency conflicts occur.

Optimistic concurrency control:

In optimistic concurrency control, users do not lock data when they read it. When a user updates data, the system checks to see if another user changed the data after it was read. If another user updated the data, an error is raised. Typically, the user receiving the error rolls back the transaction and starts over. This is called optimistic because it is mainly used in environments where there is low contention for data, and where the cost of occasionally rolling back a transaction is lower than the cost of locking data when read.

For example by default PostgreSQL uses snapshots to make sure the read data didn't change and rolls back if it changed which is an optimistic approach. However, SQL-Server use read locks by default to provide these promises.

The implementation details might change according to database system you chose. However, according to database standards they need to provide those stated transaction guarantees in one way or another using these mechanisms. If you want to know more about the topic or about a specific implementation details below are some useful links for you.

  1. SQL-Server - Transaction Locking and Row Versioning Guide
  2. PostgreSQL - Transaction Isolation
  3. PostgreSQL - Explicit Locking
  4. MySQL - Consistent Nonlocking Reads
  5. MySQL - Locking
  6. Understanding Isolation Levels (Video)
Deltadeltaic answered 26/2, 2021 at 8:3 Comment(0)
A
20

You want a SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE inside a transaction, as you said, since normally SELECTs, no matter whether they are in a transaction or not, will not lock a table. Which one you choose would depend on whether you want other transactions to be able to read that row while your transaction is in progress.

http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html

START TRANSACTION WITH CONSISTENT SNAPSHOT will not do the trick for you, as other transactions can still come along and modify that row. This is mentioned right at the top of the link below.

If other sessions simultaneously update the same table [...] you may see the table in a state that never existed in the database.

http://dev.mysql.com/doc/refman/5.0/en/innodb-consistent-read.html

Afroamerican answered 19/11, 2010 at 16:24 Comment(0)
S
13

Transaction concepts and locks are different. However, transaction used locks to help it to follow the ACID principles. If you want to the table to prevent others to read/write at the same time point while you are read/write, you need a lock to do this. If you want to make sure the data integrity and consistence, you had better use transactions. I think mixed concepts of isolation levels in transactions with locks. Please search isolation levels of transactions, SERIALIZE should be the level you want.

Spanish answered 23/11, 2015 at 19:35 Comment(1)
This should be the correct answer. Locking is for preventing race conditions, and transactions is for updating multiple tables with dependent data. Two totally different concepts, despite that transactions use locks.Doubleganger
G
6

I had a similar problem when attempting a IF NOT EXISTS ... and then performing an INSERT which caused a race condition when multiple threads were updating the same table.

I found the solution to the problem here: How to write INSERT IF NOT EXISTS queries in standard SQL

I realise this does not directly answer your question but the same principle of performing an check and insert as a single statement is very useful; you should be able to modify it to perform your update.

Goodden answered 19/11, 2010 at 16:6 Comment(0)
V
2

You are confused with lock & transaction. They are two different things in RMDB. Lock prevents concurrent operations while transaction focuses on data isolation. Check out this great article for the clarification and some graceful solution.

Velarium answered 31/8, 2015 at 10:27 Comment(3)
Locks prevent others from interfering with records you are working with describes what it does succinctly, and transactions prevent later errors (those of others making changes in parallel) from interfering with earlier things you've done (by allows rollback in the event someone did something in parallel) pretty much sums up transactions...what is confused about his comprehension of these topics?Pectoralis
sorry to ask it here, but i'm a little consfused here, do we need extra LOCKS such as EXCLUSIVE or SHARED besides ISOLATION LEVELS inside a transaction (because they internally use them) ? or choosing suitable isolation level is enough ? is there any senario that both of them needed ?Damico
Choosing the right isolation level is often enough because the DBMS will automatically handle the locking mechanisms required to maintain the integrity as per the chosen isolation level. However, there might be scenarios where you need to explicitly acquire locks, eg., explicit locks can be used strategically to avoid deadlocks by ensuring a consistent locking order across transactions or to ensure data consistency in complex transactions or when dealing with long-running operations.Velarium
C
1

I'd use a

START TRANSACTION WITH CONSISTENT SNAPSHOT;

to begin with, and a

COMMIT;

to end with.

Anything you do in between is isolated from the others users of your database if your storage engine supports transactions (which is InnoDB).

Cassette answered 19/11, 2010 at 16:6 Comment(4)
Except the table he's selecting from won't be locked to other sessions unless he specifically locks it (or until his UPDATE happens), which means other sessions could come along and modify it between the SELECT and the UPDATE.Afroamerican
After reading up on START TRANSACTION WITH CONSISTENT SNAPSHOT in the MySQL documentation, I don't see where it actually locks out another connection from updating the same row. My understanding is that it would see however the table started at the beginning of the transaction. So if another transaction is in progress, has already gotten a row and is about to update it, the 2nd transaction would still see the row before it has been updated. It could therefor potentially try to update the same row the other transaction is about to. Is that correct or am I missing something in the progress?Alexandrina
@Alexandrina It doesn't do any locking; you are correct. Locking (or not) is determined by the type of operations you do (SELECT/UPDATE/DELETE).Afroamerican
I see. It does give your own transaction read consistency, but doesn't block other users from modifying a row just before you did.Cassette

© 2022 - 2024 — McMap. All rights reserved.