Can I do transactions and locks in CouchDB?
Asked Answered
G

7

82

I need to do transactions (begin, commit or rollback), locks (select for update). How can I do it in a document model db?

Edit:

The case is this:

  • I want to run an auctions site.
  • And I think how to direct purchase as well.
  • In a direct purchase I have to decrement the quantity field in the item record, but only if the quantity is greater than zero. That is why I need locks and transactions.
  • I don't know how to address that without locks and/or transactions.

Can I solve this with CouchDB?

Guelders answered 18/11, 2008 at 18:46 Comment(0)
R
150

No. CouchDB uses an "optimistic concurrency" model. In the simplest terms, this just means that you send a document version along with your update, and CouchDB rejects the change if the current document version doesn't match what you've sent.

It's deceptively simple, really. You can reframe many normal transaction based scenarios for CouchDB. You do need to sort of throw out your RDBMS domain knowledge when learning CouchDB, though. It's helpful to approach problems from a higher level, rather than attempting to mold Couch to a SQL based world.

Keeping track of inventory

The problem you outlined is primarily an inventory issue. If you have a document describing an item, and it includes a field for "quantity available", you can handle concurrency issues like this:

  1. Retrieve the document, take note of the _rev property that CouchDB sends along
  2. Decrement the quantity field, if it's greater than zero
  3. Send the updated document back, using the _rev property
  4. If the _rev matches the currently stored number, be done!
  5. If there's a conflict (when _rev doesn't match), retrieve the newest document version

In this instance, there are two possible failure scenarios to think about. If the most recent document version has a quantity of 0, you handle it just like you would in a RDBMS and alert the user that they can't actually buy what they wanted to purchase. If the most recent document version has a quantity greater than 0, you simply repeat the operation with the updated data, and start back at the beginning. This forces you to do a bit more work than an RDBMS would, and could get a little annoying if there are frequent, conflicting updates.

Now, the answer I just gave presupposes that you're going to do things in CouchDB in much the same way that you would in an RDBMS. I might approach this problem a bit differently:

I'd start with a "master product" document that includes all the descriptor data (name, picture, description, price, etc). Then I'd add an "inventory ticket" document for each specific instance, with fields for product_key and claimed_by. If you're selling a model of hammer, and have 20 of them to sell, you might have documents with keys like hammer-1, hammer-2, etc, to represent each available hammer.

Then, I'd create a view that gives me a list of available hammers, with a reduce function that lets me see a "total". These are completely off the cuff, but should give you an idea of what a working view would look like.

Map

function(doc) 
{ 
    if (doc.type == 'inventory_ticket' && doc.claimed_by == null ) { 
        emit(doc.product_key, { 'inventory_ticket' :doc.id, '_rev' : doc._rev }); 
    } 
}

This gives me a list of available "tickets", by product key. I could grab a group of these when someone wants to buy a hammer, then iterate through sending updates (using the id and _rev) until I successfully claim one (previously claimed tickets will result in an update error).

Reduce

function (keys, values, combine) {
    return values.length;
}

This reduce function simply returns the total number of unclaimed inventory_ticket items, so you can tell how many "hammers" are available for purchase.

Caveats

This solution represents roughly 3.5 minutes of total thinking for the particular problem you've presented. There may be better ways of doing this! That said, it does substantially reduce conflicting updates, and cuts down on the need to respond to a conflict with a new update. Under this model, you won't have multiple users attempting to change data in primary product entry. At the very worst, you'll have multiple users attempting to claim a single ticket, and if you've grabbed several of those from your view, you simply move on to the next ticket and try again.

Reference: https://wiki.apache.org/couchdb/Frequently_asked_questions#How_do_I_use_transactions_with_CouchDB.3F

Rainband answered 18/11, 2008 at 18:51 Comment(12)
It's not clear to me how having 'tickets' that you attempt to claim in sequence is a significant improvement over simply retrying the read/modify/write to update the master entity. Certainly it doesn't seem worth the extra overhead, especially if you have large amounts of stock.Uund
From my perspective, the ticket convention is "simpler" to build. Failed updates on the master entry require you to reload the document, perform your operation again, and then save. The ticket thing allows you to try and "claim" something without having to request more data.Rainband
Also, it depends what sort of overhead you're worried about. You're either going to fight with increased contention, or have additional storage requirements. Given that a ticket can also double as a purchase record, I don't know that there'd be as much of a storage problem as you think.Rainband
I am editing a quantity field of a product document. Then I must create thousands of "tickets" if quantity=2K for example. Then I reducing a quantity, I must delete some tickets. Sounds completely unrelaxed for me. A lot of headache in basic use cases. Maybe I am missing something, but why not bring back previously removed transaction behavior, just make it optional with something like _bulk_docs?reject_on_conflict=true. Quite useful in single-master configurations.Gettogether
Bulk inserts for tickets doesn't seem like a huge deal to me. Depending on your setup, you could just add a few tickets at a time and put more in as quantities change. You'll likely need some sort of document per quantity reduction in any case. If you reduce quantity because someone bought one, the ticket can also serve as a purchase record for that particular item. Same goes for returns or most anything else that reduces quantity.Rainband
Note that the read/modify/write gives you correct results only when running on a single CouchDB server. When replication is involved, application instances talking to different can all successfully perform conflicting updates, and you're left with two conflicting versions in the database.Lempira
To avoid contention an inProgress field could be added that would be set to true as soon as someone clicks the buy button. If they time out or cancel it would be reset and go back onto the inventory count.Picky
You answered the locking question (kinda) but not really the transactions question. What if you need to update 2 different documents atomically? I.e. a bank account transfer, you want to debit one account and credit another account, but if either update fails then both updates should be canceled. Can that be done in Couch? (This isn't a criticism, I'm trying to understand the philosophy and intent of Couch.)Instep
@mehaase: Read this: guide.couchdb.org/draft/recipes.html, the answer boils down to couchdb's internal datastructure "you never change data, you just append new". In your scenario that means creating one (atomic) transaction from account to an in-transit account for the debit and a second (atomic) transaction from the in-transit account forward (or back). That is how real banks do it. Every step is always documented.Trucking
@MrKurt: What happens if i claim a hammer and then my database crashes before i can insert the invoice-document that uses the hammer?Trucking
You don't need to emit the docid and rev in the value of your view, you get those for free in view results. Doing so, especially in a verbose way like a dictionary, is just going to grow your view unnecessarily large.Vanscoy
@FabianZeindl not entirely sure what you mean by "claim a hammer", but - at least using the views described in my answer below - you write a single document that uses the hammer. Your code writes the document, and on a successful write, continues with whatever it wanted to do with said hammer. If the db write fails, you don't use the hammer because you haven't successfully "claimed" itVanscoy
L
27

Expanding on MrKurt's answer. For lots of scenarios you don't need to have stock tickets redeemed in order. Instead of selecting the first ticket, you can select randomly from the remaining tickets. Given a large number tickets and a large number of concurrent requests, you will get much reduced contention on those tickets, versus everyone trying to get the first ticket.

Leftward answered 25/11, 2008 at 10:2 Comment(0)
R
23

A design pattern for restfull transactions is to create a "tension" in the system. For the popular example use case of a bank account transaction you must ensure to update the total for both involved accounts:

  • Create a transaction document "transfer USD 10 from account 11223 to account 88733". This creates the tension in the system.
  • To resolve any tension scan for all transaction documents and
    • If the source account is not updated yet update the source account (-10 USD)
    • If the source account was updated but the transaction document does not show this then update the transaction document (e.g. set flag "sourcedone" in the document)
    • If the target account is not updated yet update the target account (+10 USD)
    • If the target account was updated but the transaction document does not show this then update the transaction document
    • If both accounts have been updated you can delete the transaction document or keep it for auditing.

The scanning for tension should be done in a backend process for all "tension documents" to keep the times of tension in the system short. In the above example there will be a short time anticipated inconsistence when the first account has been updated but the second is not updated yet. This must be taken into account the same way you'll deal with eventual consistency if your Couchdb is distributed.

Another possible implementation avoids the need for transactions completely: just store the tension documents and evaluate the state of your system by evaluating every involved tension document. In the example above this would mean that the total for a account is only determined as the sum values in the transaction documents where this account is involved. In Couchdb you can model this very nicely as a map/reduce view.

Rhett answered 18/11, 2008 at 18:46 Comment(9)
But what about cases where the account is debited but the tension doc isn't changed? Any failure scenario between those two points, if they are not atomic, will cause permanent inconsistency, right? Something about the process has to be atomic, that's the point of a transaction.Unhallow
Yes, you're correct, in this case -- while the tension is not resolved -- there will be inconsistency. However the inconsistency is only temporary until the next scan for tension documents detects this. That's the trade of in this case, a kind of eventual consistency regarding time. As long as you decrent the source acount first and later increment the target account this can be acceptable. But beware: tension documents wont give you ACID transactions on top of REST. But they can be a good tradeoff between pure REST and ACID.Rhett
Imagine every tension document has a timestamp, and account documents have a 'last-tension-applied' field - or a list of applied tensions. When you debit the source account you also update the 'last-tension-applied' field. Those two operations are atomic because they are on the same document. The target account also has a similar field. That way the system can always tell which tension docs have been applied to which accounts.Confidante
How to detect whether the source/destination document was updated already? What if it fails after step 1, then is re-executed and fails again, and so on, you'll keep deducting the source account?Humidity
@wump: you will need to record that the tension document has been applied onto the account. e.g. by appending the tension document id onto a list property of either account. when all accounts touched by the tension document have been updated then mark the tension document as "done" or delete it. Afterwards the document id can be dropped from the list for all accounts.Rhett
Thanks! That makes it clear to me. Just temporarily keep a list of the applied tension documents in a separate field, which can be updated atomically. A 'last-tension-applied' will also work but is harder because it assumes monotonically increasing ordering, which takes special care in a distributed setting.Humidity
@ordnungswidrig, I love this concept of tension documents, but the "decrement" and "timestamp applied" operations you are mentioning are two discrete operations, two separate HTTP POST calls to update the same doc which is why I am still confused. Can you clarify how after you do a POST to decrement the first account, and then the server crashes, and you start everything back up, how you avoid re-decrementing? I suppose if you wrote a key into the decrement field ("total":"$121.00[applied-timestamp]") that would work, but is messy.Hyder
Ahh! I retract my question, I see now that the Bulk API includes an optional transactional behavior for committing multiple changes. Very cool tip on the tension documents guys, appreciate that! wiki.apache.org/couchdb/…Hyder
All this only seems to work if the transactions reading or writting the same documents are serialized, i.e., one is executed fully after another. @RiyadKalla The Bulk API is not transactional anymore ...Philemon
K
7

No, CouchDB is not generally suitable for transactional applications because it doesn't support atomic operations in a clustered/replicated environment.

CouchDB sacrificed transactional capability in favor of scalability. In order to have atomic operations you need a central coordination system, which limits your scalability.

If you can guarantee you only have one CouchDB instance or that everyone modifying a particular document connects to the same CouchDB instance then you could use the conflict detection system to create a sort of atomicity using methods described above but if you later scale up to a cluster or use a hosted service like Cloudant it will break down and you'll have to redo that part of the system.

So, my suggestion would be to use something other than CouchDB for your account balances, it will be much easier that way.

Kicker answered 25/10, 2011 at 3:28 Comment(0)
V
6

As a response to the OP's problem, Couch is probably not the best choice here. Using views is a great way to keep track of inventory, but clamping to 0 is more or less impossible. The problem being the race condition when you read the result of a view, decide you're ok to use a "hammer-1" item, and then write a doc to use it. The problem is that there's no atomic way to only write the doc to use the hammer if the result of the view is that there are > 0 hammer-1's. If 100 users all query the view at the same time and see 1 hammer-1, they can all write a doc to use a hammer 1, resulting in -99 hammer-1's. In practice, the race condition will be fairly small - really small if your DB is running localhost. But once you scale, and have an off site DB server or cluster, the problem will get much more noticeable. Regardless, it's unacceptable to have a race condition of that sort in a critical - money related system.

An update to MrKurt's response (it may just be dated, or he may have been unaware of some CouchDB features)

A view is a good way to handle things like balances / inventories in CouchDB.

You don't need to emit the docid and rev in a view. You get both of those for free when you retrieve view results. Emitting them - especially in a verbose format like a dictionary - will just grow your view unnecessarily large.

A simple view for tracking inventory balances should look more like this (also off the top of my head)

function( doc )
{
    if( doc.InventoryChange != undefined ) {
        for( product_key in doc.InventoryChange ) {
            emit( product_key, 1 );
        }
    }
}

And the reduce function is even more simple

_sum

This uses a built in reduce function that just sums the values of all rows with matching keys.

In this view, any doc can have a member "InventoryChange" that maps product_key's to a change in the total inventory of them. ie.

{
    "_id": "abc123",
    "InventoryChange": {
         "hammer_1234": 10,
         "saw_4321": 25
     }
}

Would add 10 hammer_1234's and 25 saw_4321's.

{
    "_id": "def456",
    "InventoryChange": {
        "hammer_1234": -5
    }
}

Would burn 5 hammers from the inventory.

With this model, you're never updating any data, only appending. This means there's no opportunity for update conflicts. All the transactional issues of updating data go away :)

Another nice thing about this model is that ANY document in the DB can both add and subtract items from the inventory. These documents can have all kinds of other data in them. You might have a "Shipment" document with a bunch of data about the date and time received, warehouse, receiving employee etc. and as long as that doc defines an InventoryChange, it'll update the inventory. As could a "Sale" doc, and a "DamagedItem" doc etc. Looking at each document, they read very clearly. And the view handles all the hard work.

Vanscoy answered 9/8, 2013 at 16:17 Comment(2)
Interesting strategy. As a CouchDB newb it would appear that in order to calculate the current number of hammers, you need to perform a map/reduce over the company's entire history of inventory changes for hammers. This could be years worth of changes. Is there some built-in feature of CouchDB which will make this performant?Loisloise
Yes, views in CouchDB are like a continuos, persistent map/reduce. You're correct that to do it starting from scratch on a large data set would take ages, but when new documents are added, they only update the existing view, it doesn't have to recalculate the entire view. Bear in mind there is both a space and CPU requirement for views. Also, at least when I worked with CouchDB professionally (it's been a few years), it was pretty important to only use the built in reduce functions ie. _sum. Custom Javascript reduce functions were extremely slowVanscoy
B
2

Actually, you can in a way. Have a look at the HTTP Document API and scroll down to the heading "Modify Multiple Documents With a Single Request".

Basically you can create/update/delete a bunch of documents in a single post request to URI /{dbname}/_bulk_docs and they will either all succeed or all fail. The document does caution that this behaviour may change in the future, though.

EDIT: As predicted, from version 0.9 the bulk docs no longer works this way.

Bluecollar answered 20/2, 2009 at 8:9 Comment(2)
That wouldn't really help in the situation being discussed, i.e. contention on single docs from multiple users.Leftward
Starting with CouchDB 0.9, the semantics of bulk updates have changed.Eurystheus
P
0

Just use SQlite kind of lightweight solution for transactions, and when the transaction is completed successfully replicate it, and mark it replicated in SQLite

SQLite table

txn_id    , txn_attribute1, txn_attribute2,......,txn_status
dhwdhwu$sg1   x                    y               added/replicated

You can also delete the transactions which are replicated successfully.

Preclude answered 21/6, 2017 at 4:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.