Pyramid REST API: How do I handle concurrent data access safely?
Asked Answered
H

2

18

I am working on a REST API for a web service using Pyramid and Cornice; data on the server side is handled using SQLAlchemy and MySQL. The web server is nginx using uwsgi, and it's configured to run multiple Python processes:

[uwsgi]
socket = localhost:6542
plugins = python34
...
processes = 2 # spawn the specified number of workers/processes
threads = 2 # run each worker in prethreaded mode with the specified number of threads

Problem

Suppose a table customers on the server side. Using the API one can read customer data, modify it, or delete it. In addition to that there are other API functions which read customer data.

I could issue multiple API calls at the same time which then compete for the same customer resource:

# Write/modify the customer {id} data
curl --request POST ... https://some.host/api/customer/{id}
# Delete customer {id} and all of its associated data
curl --request DELETE https://some.host/api/customer/{id}
# Perform some function which reads customer {id}
curl --request GET ... https://some.host/api/do-work

Essentially this is a Readers-Writers Problem, but because more than one process is involved, traditional thread synchronization using locks/mutexes/semaphores won't work here.

Question

I'd like to understand the best way to implement locking and synchronization for such a Pyramid based web API, such that concurrent calls like in the above example are handled safely and efficiently (i.e. without unnecessary serializing).

Solutions (?)

Hygienic answered 26/11, 2015 at 9:24 Comment(0)
M
14

I assume you are dealing with one MySQL database and your locks do not need to cover other resources (Redis, third party APIs, etc.). I also assume your client side functions do not itself need to work on transactions data (maintain a session over several API calls), you just want to prevent concurrent API access to messing up your database.

There are two kinds of locking, pessimistic locking and optimistic locking.

Pessimistic locking is what most people usually know by locking - you create and acquire locks beforehand, programmatically in code. This is what distributed lock manager is.

Optimistic locking is what you can quite easily get away with SQL databases. If two transactions compete from the same resource, database effectively dooms one of the transactions and the application framework (in this case Pyramid + pyramid_tm) can retry the transaction N times before giving up.

The optimistic locking is more ideal solution from the development point of view, as it does not put any cognitive load on the application developer to remember to lock resources correctly or create in-house locking mechanisms. Instead, the developer relies on framework and database to retry and manage concurrency situations. However, optimistic locking is not that well known among web developers, because doing optimistic locking in widespread PHP'esque environments is difficult due to lack of flexibility in the programming language.

pyramid_tm implements optimistic locking solution and I would recommend you to use it or some other optimistic locking solution, unless you know a very specific reason you don't want to.

  • pyramid_tm ties transaction life cycle to HTTP request, very natural from the web developer point of view

  • pyramid_tm can tie other events to successful transactions, e.g. pyramid_mailer sends out email to users only if the transactions commit

  • pyramid_tm is well tested and based on ZODB transaction transaction manager, which has been in production usage since early 2000

  • Make sure your SQLAlchemy session is set to SERIALIZABLE SQL isolation level - you start with the highest consistency model. You can lower this requirement for performance if you know API calls tolerate it - e.g. calls doing statistics read only analysis.

  • Optimistic locking usually performs better in "normal" lots of reads - few writes workloads where it is rare that a conflict raises (two API calls update the same user once). The transaction retry penalty hits only if there is a conflict.

  • If the transaction ultimately fail after N retries, e.g. under unusual high load situation, this should be resolved on the API consumer side telling that the server side data has changed and the user must verify or refill the form again

Further reading

May answered 30/11, 2015 at 6:23 Comment(0)
R
3

Usually you start with determining what kind of consistency model is acceptable. The weaker your consistency requirements the easier this problem becomes on the server side.

For example:

Is it possible to get away with optimistic concurrency? I.e. assume that you have the lock, execute your operation, but detect when there is a concurrency situation so you can properly recover? This can be a good option if you do not expect a lot of collision. Sqlalchemy should be able to detect that it is updating a row that has already been modified for example.

If that is not acceptable than you could use distributed locking in redis. You could probably use this to come up with some form of synchronization.

Risley answered 26/11, 2015 at 18:30 Comment(1)
Thanks! WRT SQLAlchemy, this answer seems to be a viable solution. Just have to find a way to integrate that with the Pyramid session handling.Hygienic

© 2022 - 2024 — McMap. All rights reserved.