RESTful idempotence
Asked Answered
P

4

6

I'm designing a RESTful web service utilizing ROA(Resource oriented architecture).

I'm trying to work out an efficient way to guarantee idempotence for PUT requests that create new resources in cases that the server designates the resource key.

From my understanding, the traditional approach is to create a type of transaction resource such as /CREATE_PERSON. The the client-server interaction for creating a new person resource would be in two parts:

Step 1: Get unique transaction id for creating the new PERSON resource:::

**Client request:**
POST /CREATE_PERSON

**Server response:**
200 OK
transaction-id:"as8yfasiob"

Step 2: Create the new person resource in a request guaranteed to be unique by using the transaction id:::

**Client request**
PUT /CREATE_PERSON/{transaction_id}
first_name="Big bubba"

**Server response**
201 Created             // (If the request is a duplicate, it would send this
PersonKey="398u4nsdf"   // same response without creating a new resource.  It
                        // would perhaps send an error response if the was used
                        // on a transaction id non-duplicate request, but I have
                        // control over the client, so I can guarantee that this
                        // won't happen)

The problem that I see with this approach is that it requires sending two requests to the server in order to do to single operation of creating a new PERSON resource. This creates a performance issues increasing the chance that the user will be waiting around for the client to complete their request.

I've been trying to hash out ideas for eliminating the first step such as pre-sending transaction-id's with each request, but most of my ideas have other issues or involve sacrificing the statelessness of the application.

Is there a way to do this?

Edit::::::

The solution that we ended up going with was for the client to acquire a UUID and send it along with the request. A UUID is a very large number occupying the space of 16 bytes (2^128). Contrary to what someone with a programming mind might intuitively think, it is accepted practice to randomly generate a UUID and assume that it is a unique value. This is because the number of possible values is so large that the odds of generating two of the same number randomly are low enough to be virtually impossible.

One caveat is that we are having our clients request a UUID from the server (GET uuid/). This is because we cannot guarantee the environment that our client is running in. If there was a problem such as with seeding the random number generator on the client, then there very well could be a UUID collision.

Polik answered 2/6, 2010 at 3:21 Comment(0)
I
4

You are using the wrong HTTP verb for your create operation. RFC 2616 specifies the semantic of the operations for POST and PUT.

Paragraph 9.5:

POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line

Paragraph 9.6

PUT method requests that the enclosed entity be stored under the supplied Request-URI.

There are subtle details of that behavior, for example PUT can be used to create new resource at the specified URL, if one does not already exist. However, POST should never put the new entity at the request URL and PUT should always put any new entity at the request URL. This relationship to the request URL defines POST as CREATE and PUT as UPDATE.

As per that semantic, if you want to use PUT to create a new person, it should be created in /CREATE_PERSON/{transaction_id}. In other words, the transaction ID returned by your first request should be the person key used to fetch that record later. You shouldn't make PUT request to a URL that is not going to be the final location of that record.

Better yet, though, you can do this as an atomic operation by using a POST to /CREATE_PERSON. This allows you with a single request to create the new person record and in the response to get the new ID (which should also be referred in the HTTP Location header as well).

Meanwhile, the REST guidelines specify that verbs should not be part of the resource URL. Thus, the URL to create new person should be the same as the location to get the list of all persons - /PERSONS (I prefer the plural form :-)).

Thus, your REST API becomes:

  • to get all persons - GET /PERSONS
  • to get single person - GET /PERSONS/{id}
  • to create new person - POST /PERSONS with the body containing the data for the new record
  • to update existing person or create new person with well-known id - PUT /PERSONS/{id} with the body containing the data for the updated record.
  • to delete existing person - DELETE /PERSONS/{id}

Note: I personally prefer not using PUT for creating records for two reasons, unless I need to create a sub record that has the same id as an already existing record from a different data set (also known as 'the poor man's foreign key' :-)).

Update: You are right that POST is not idempotent and that is as per HTTP spec. POST will always return a new resource. In your example above that new resource will be the transaction context.

However, my point is that you want the PUT to be used to create a new resource (a person record) and according to the HTTP spec, that new resource itself should be located at the URL. In particular, where your approach breaks is that the URL you use with the PUT is a representation of the transactional context that was created by the POST, not a representation of the new resource itself. In other words, the person record is a side effect of updating the transaction record, not the immediate result of it (the updated transaction record).

Of course, with this approach the PUT request will be idempotent, since once the person record is created and the transaction is 'finalized', subsequent PUT requests will do nothing. But now you have a different problem - to actually update that person record, you will need to make a PUT request to a different URL - one that represents the person record, not the transaction in which it was created. So now you have two separate URLs your API clients have to know and make requests against to manipulate the same resource.

Or you could have a complete representation of the last resource state copied in the transaction record as well and have person record updates go through the transaction URL for updates as well. But at this point, the transaction URL is for intends and purposes the person record, which means it was created by the POST request in first place.

Interstitial answered 2/6, 2010 at 3:56 Comment(17)
If you'd like for the resource name to sound more like a noun, you could rename it to CREATE_PERSON_TRANSACTION. The request: PUT CREATE_PERSON_TRANSACTION/{transaction_id} does not violate the rule that the resource must be saved at that URL because: GET CREATE_PERSON_TRANSACTION/{transaction_id} would return information on the transaction such as if the transaction is in progress or finished. It would not return a person.Polik
Using POST PERSONS/ to create a new person resource is not idempotent. Idempotent means that the request can be executed once or many times and the result would be the same. This is important because an idempotent request can be re-sent by the client if the client does not receive a response. The reason I posted this question was to get ideas on creating resources idempotently.Polik
@Franci I agree the resource names chosen are confusing, but I don't agree that the operations he is performing are a problem. Just change the names to POST /PersonIdentifiers PUT/Person/{Identifier} I see no problem creating an identifier as a separate step. As long as no two identical identifiers are ever created and there is no requirement that every identifier must actually be used to create a person then it seems fine to me.Ballata
@Darrel - that's what I was trying to get him to do. However, he wants to create a separate transaction id with the POST request and create the person record with a PUT to a URL for transaction, and get back the actual person id.Interstitial
@Franci Ok, I missed that PersonKey. Yeah, I agree that's wierd. The whole point of a uniform interface is that you do stuff that people expect, not just conform to the letter of the law.Ballata
@Franci - I just saw your update. I think you are right, the approach that I propose is too messy. I'm considering just letting the client create a GUID as a key to the resource. That way to create or update a resource, they can just do PUT PERSON/{GUID}. Then GET PERSON/{GUID} to read it. Before, I hadn't considered the possibility that the client might have been able to dictate the resource key. Thanks a lot for your feedback on this issue.Polik
-1 I think this answer completely misses the point that if you POST something without a request id, there's no way to handle lost messages to the server and hence to way to handle idempotence.Soapwort
@Soapwort - huh?! POST is by definition not idempotent. Two identical POST should result in two new subordinate resources created. As per HTTP. There are no "lost messages", if you POST' and the new resource was created successfully, the response should contain the Location` header with the URI for the new resource.Interstitial
But the question is specifically asking about what happens when you factor real networks into the equation -- I could imagine adding a header to the POST request with the request id so that the server can do deduplication.Soapwort
@Soapwort - the question is how to avoid having two operations (GET id/PUT data). HTTP already provides a mechanism - POST data, and you'll get the new id in the Location header of the response. Deduplication (not creating a second record wih the same data) is orthogonal and is something that the server business logic need to do in both cases, as it depends exclusively on the semantic of the data.Interstitial
@FranciPenov It's not orthogonal though - it can be a part of the request, thereby avoiding having two operations, but still using POST - but look at the title -- it's 'idempotent REST', and while you are 100% correct about the semantics of the verbs given no specific request id or similar, you are actually not answering the question about idempotence.Soapwort
@Soapwort You are technically correct. As the OP comments "The reason I posted this question was to get ideas on creating resources idempotently.". The ways to do this: 1) Use the two requests GET id/PUT data; 2) Generate unique id on the client and do PUT data; or 3) POST data and expect the server to return error if the POST would result in duplicates. Your suggestion is a mix between 2) and 3), and will work, but it's too complex, prone to errors, and functionally equivalent to 3) alone.Interstitial
@Soapwort ...but if you go with 3) alone, now the solution is exactly what I answered with, and the deduplication is based on semantic parsing of the data and is completely orthogonal from the delivering of the data to the server.Interstitial
@Soapwort ...btw, the idea that the client can help the deduplication by adding the id to the POST assumes that the client has intimate knowledge about what the server considers to be duplicate data, which in the case of public REST API is not always the case.Interstitial
@FranciPenov it's not complex; the problem is that the client can't access its MAC address, so it would have to base its uniqueness for the request id on something server-side or something like its user-agent string + ticks: Message ids are commonplace and pretty well known, so what I'm suggesting is a GET /msgids initially and then POST /resource, w/ header X-Request-Id: 56020ef. It's called out-of-band data; and because it is, it's not actually a part of the "business data" and therefore not a part of something uniquely identifiable on server (and not on client)...Soapwort
@FranciPenov That said; I would agree that posting twice would be OK; if there was a way to get Location on the second reply - in that case; what HTTP code would you return?Soapwort
@Soapwort HTTP 303 See Other, which says 'please, do GET on this other URL instead of the current request'. The Location header will contain the URL of the deduplicated resource.Interstitial
P
2

I just came across this post: Simple proof that GUID is not unique

Although the question is universally ridiculed, some of the answers go into deeper explanation of GUIDs. It seems that a GUID is a number of 2^128 in size and that the odds of randomly generating two of the same numbers of this size so low as to be impossible for all practical purposes.

Perhaps the client could just generate its own transaction id the size of a GUID instead of querying the server for one. If anyone can discredit this, please let me know.

Polik answered 5/6, 2010 at 1:39 Comment(3)
It would be absolutely valid to create a GUID on the client and then do PUT /Person/{Guid} However, I really don't understand what this notion of a "transaction id" is for.Ballata
Maybe a better term in this case would be "request_id." The idea is so the client can make the same request again if it does not receive a response from the server the first time and be confident of idempotence. The request would be idempotent because the server could look at the request_id and if it matched a request that was already made, it would send a response indicating the request was a duplicate instead of processing the request again and adding a duplicate person to the database.Polik
@ChrisDutrow, I think your approach is spot in and seems to be inline with PUT's design intention.Jolynnjon
B
1

I'm not sure I have a direct answer to your question, but I see a few issues that may lead to answers.

Your first operation is a GET, but it is not a safe operation as it is "creating" a new transaction Id. I would suggest POST is a more appropriate verb to use.

You mention that you are concerned about performance issues that would be perceived by the user caused by two round trips. Is this because your user is going to create 500 objects at once, or because you are on a network with massive latency problems?

If two round trips are not a reasonable expense for creating an object in response to a user request, then I would suggest HTTP is not the right protocol for your scenario. If however, your user needs to create large amounts of objects at once, then we can probably find a better way of exposing resources to enable that.

Ballata answered 2/6, 2010 at 3:42 Comment(3)
Yup, you're right, it should be a POST, I changed it. Your response may be highlighting my inexperience with this architecture. I previously have built this application using ASP.NET and it is very, very slow.Polik
@DutrowLLC I am quite confident that your app is not slow because you are making two round trips when the user creates an object.Ballata
Yes, ASP.NET was poorly suited to the project and seemed to have very high overhead that slowed everything down. Additionally, I used Entity Framework which can also be very inefficient.Polik
Z
0

Why don't you just use a simple POST, also including the payload on your first call. This way you save on extra call and don't have to spawn a transaction:


POST /persons

first_name=foo

response would be:


HTTP 201 CREATED
...
payload_containing_data_and_auto_generated_id

server-internally an id would be generated. for simplicity i would go for an artifial primary key (e.g. auto-increment id from database).

Zeph answered 2/6, 2010 at 20:16 Comment(3)
That is the right way to do it, but that POST request is not an idempotent request, which seems to bug the OP.Interstitial
ah, i see... now the question is why does it need to be idempotent?Zeph
@manuelaldana it needs to be idempotent because he wants to only create a single person, not two. So if the first response gets lost, the client cannot correlate its next request to get the resource detailsSoapwort

© 2022 - 2024 — McMap. All rights reserved.