Consequences of POST not being idempotent (RESTful API)
Asked Answered
S

7

40

I am wondering if my current approach makes sense or if there is a better way to do it.

I have multiple situations where I want to create new objects and let the server assign an ID to those objects. Sending a POST request appears to be the most appropriate way to do that. However since POST is not idempotent the request may get lost and sending it again may create a second object. Also requests being lost might be quite common since the API is often accessed through mobile networks.

As a result I decided to split the whole thing into a two-step process:

  1. First sending a POST request to create a new object which returns the URI of the new object in the Location header.

  2. Secondly performing an idempotent PUT request to the supplied Location to populate the new object with data. If a new object is not populated within 24 hours the server may delete it through some kind of batch job.

Does that sound reasonable or is there a better approach?

Sophisticated answered 21/12, 2012 at 14:26 Comment(0)
S
45

The only advantage of POST-creation over PUT-creation is the server generation of IDs. I don't think it worths the lack of idempotency (and then the need for removing duplicates or empty objets).

Instead, I would use a PUT with a UUID in the URL. Owing to UUID generators you are nearly sure that the ID you generate client-side will be unique server-side.

Subordinate answered 28/12, 2012 at 13:6 Comment(8)
I like that idea... didn't think of that. ThanksSophisticated
What if someone were to emulate the front-end (with soapUI, for example) and send gibberish in place of your UUID instead?Sibbie
@PriiduNeemre Even "gibberish", an ID is an ID. A gibberish ID does not break the system as a whole. However, you are right, if there are several "gibberish" front-ends, they will have ID collisions between them (but not with others). If it is not intentional, you can check on the server side that the ID follows at least the right pattern. If it is intentional, you can set up authentication, authorization and accounting to prevent this to happen again.Upanchor
By gibberish I meant something more along the lines of "duck-you-api-admin", which is not something you want to see in your (or your client's) database :p. Ofcourse, you might set up CHECK constraints and a regex to verify the correctness of the UUID syntax, but then there'd still be the uniqueness problem. So in the end, I'm not sure doing this stuff on the client side is really worth the trouble (IMHO).Sibbie
In this case there is certainly more server side validation that needs to be done. IE you should probably have logic the ensure the ID conforms to some standard, say maybe UUID v4. Also from the client side, clients will need to preform retry logic with a new ID in the case that a duplicate ID is in fact generated. While very unlikely it is possible.Synchronous
@Synchronous A "duplicate ID" is very unlikely: having a 50% probability of at least one collision (...) would be equivalent to generating 1 billion UUIDs per second for about 85 years (see Wikipedia).Upanchor
@Aurélien for one you are assuming the developers using your API are generating good uuids. If you don't control the client no guarantee that they are not generating duplicates a lot more than that. Even if they are doing a great job and creating good uuids there is still a chance. You have to consider was losing a resource because of a duplicate uuid might mean. If 2 clients generate the same uuid, second client will override first clients data. Meaning in something like a banking system this could be extremely bad.Synchronous
There is another large advantage for using POST for resource creation and reserving PUT for updates in that if you are also dealing with concurrency, it gets very hard to separate out if you only have a single PUT request what the correct response should be for a client retrying but not realized their first attempt succeeded and changed the version. You don't want to slap them with a 309 or 412, since it was their original request that actually succeeded. You need to be able to identify updates from creation, that was I always use an idempotent POST for creating.Trudge
S
18

well it all depends, to start with you should talk more about URIs, resources and representations and not be concerned about objects.

The POST Method is designed for non-idempotent requests, or requests with side affects, but it can be used for idempotent requests.

on POST of form data to /some_collection/

normalize the natural key of your data (Eg. "lowercase" the Title field for a blog post)
calculate a suitable hash value (Eg. simplest case is your normalized field value)
lookup resource by hash value
if none then
    generate a server identity, create resource
        Respond =>  "201 Created", "Location": "/some_collection/<new_id>" 
if found but no updates should be carried out due to app logic
        Respond => 302 Found/Moved Temporarily or 303 See Other 
        (client will need to GET that resource which might include fields required for updates, like version_numbers)
if found but updates may occur
   Respond => 307 Moved Temporarily, Location: /some_collection/<id> 
   (like a 302, but the client should use original http method and might do automatically) 

A suitable hash function might be as simple as some concatenated fields, or for large fields or values a truncated md5 function could be used. See [hash function] for more details2.

I've assumed you:

  • need a different identity value than a hash value
  • data fields used for identity can't be changed
Serration answered 2/1, 2013 at 12:43 Comment(1)
Careful here, as @Serration points out. 'assumption is data fields used for identity can't be changed'. This is big if you don't have a unique set of data field that cannot be change by the user.Synchronous
C
7

Your method of generating ids at the server, in the application, in a dedicated request-response, is a very good one! Uniqueness is very important, but clients, like suitors, are going to keep repeating the request until they succeed, or until they get a failure they're willing to accept (unlikely). So you need to get uniqueness from somewhere, and you only have two options. Either the client, with a GUID as Aurélien suggests, or the server, as you suggest. I happen to like the server option. Seed columns in relational DBs are a readily available source of uniqueness with zero risk of collisions. Round 2000, I read an article advocating this solution called something like "Simple Reliable Messaging with HTTP", so this is an established approach to a real problem.

Reading REST stuff, you could be forgiven for thinking a bunch of teenagers had just inherited Elvis's mansion. They're excitedly discussing how to rearrange the furniture, and they're hysterical at the idea they might need to bring something from home. The use of POST is recommended because its there, without ever broaching the problems with non-idempotent requests.

In practice, you will likely want to make sure all unsafe requests to your api are idempotent, with the necessary exception of identity generation requests, which as you point out don't matter. Generating identities is cheap and unused ones are easily discarded. As a nod to REST, remember to get your new identity with a POST, so it's not cached and repeated all over the place.

Regarding the sterile debate about what idempotent means, I say it needs to be everything. Successive requests should generate no additional effects, and should receive the same response as the first processed request. To implement this, you will want to store all server responses so they can be replayed, and your ids will be identifying actions, not just resources. You'll be kicked out of Elvis's mansion, but you'll have a bombproof api.

Centavo answered 10/2, 2016 at 16:23 Comment(5)
Thanks for your input on the matter. So for your last point you suggest that an idempotent DELETE should alway return 200. Not 200 on the first call and 404 on additional calls as dome people are saying when focusing on server state and considering return codes irrelevant regarding that question.Sophisticated
Exactly. According to ME, all unsafe requests should start by requesting an empty action on a resource, then the substantive unsafe request addresses the action, not the resource. This lets the server resend the response to a previously-seen request without having to reprocess the request. I have a very short little paper on this that I'd love you to proof-read if you're interested. bbsimonbb at gmail dot com.Centavo
Sure... feel free to send it to mibollma at outlook dot comSophisticated
Instead of requiring two roundtrips to the server, your client could include a client-generated, (client) unique ID with the POST request. The back-end stores this ID with the created object. When the server receives a POST request and finds an object created within say the past five minutes with that request, it will recognize it as a repeat, not create the new object and return the already existing one. Of course you would need to make sure that an authenticated client cannot spoof unique IDs of other clients and this way retrieve data posted by these other clients.Martres
I'd suggest not basing anything on duration. With ids and stored responses you don't need to. The id is how you recognise repeats.Centavo
E
3

But now you have two requests that can be lost? And the POST can still be repeated, creating another resource instance. Don't over-think stuff. Just have the batch process look for dupes. Possibly have some "access" count statistics on your resources to see which of the dupe candidates was the result of an abandoned post.

Another approach: screen incoming POST's against some log to see whether it is a repeat. Should be easy to find: if the body content of a request is the same as that of a request just x time ago, consider it a repeat. And you could check extra parameters like the originating IP, same authentication, ...

Elora answered 21/12, 2012 at 14:52 Comment(2)
You are right that now i can lose two requests. My thinking is, losing the first one is no problem because its an uninitialized objects which can easily be detected as being uninitialized. Losing the second one is no problem because the request is idempotent and can be repeated. What i want to avoid that two or more objects appear on the clients side. But you are right... having some screening algorithm on the server might work just as well :)Sophisticated
You suggest to not over-think stuff, then you over-think. The solution proposed in the question is more elegant than this one. Are you trying to maintain REST purity?Centavo
D
3

No matter what HTTP method you use, it is theoretically impossible to make an idempotent request without generating the unique identifier client-side, temporarily (as part of some request checking system) or as the permanent server id. An HTTP request being lost will not create a duplicate, though there is a concern that the request could succeed getting to the server but the response does not make it back to the client.

If the end client can easily delete duplicates and they don't cause inherent data conflicts it is probably not a big enough deal to develop an ad-hoc duplication prevention system. Use POST for the request and send the client back a 201 status in the HTTP header and the server-generated unique id in the body of the response. If you have data that shows duplications are a frequent occurrence or any duplicate causes significant problems, I would use PUT and create the unique id client-side. Use the client created id as the database id - there is no advantage to creating an additional unique id on the server.

Dwinnell answered 31/12, 2012 at 19:26 Comment(2)
Thanks for your response. I have only two comments. In case of 201 i think it would be slightly nicer to use the location header providing an URI to the newly created resource instead of using the body. The only advantage i see for creating the ID on the server instead of some UUID is that the native primary key mechanism of the server database can be used. In any case as you said, creating two IDs doesn't seem to be useful.Sophisticated
I think we both agree with @aurelien that creating the unique id client-side is probably the way to go. Using location header and entity body for a 201 is not either/or. I would do both (and more if appropriate.) The spec says as much: "The newly created resource can be referenced by the URI(s) returned in the entity of the response, with the most specific URI for the resource given by a Location header field."Dwinnell
P
2

I think you could also collapse creation and update request into only one request (upsert). In order to create a new resource, client POST a “factory” resource, located for example at /factory-url-name. And then the server returns the URI for the new resource.

Piddle answered 21/12, 2012 at 14:37 Comment(1)
I am not sure I fully understand how he can collapse into one request. Would you mind updating the answer with a little more detail?Synchronous
D
-1

Why don't you use a request Id on your originating point (your originating point should do two things, send a GET request on request_id=2 to see if it's request has been applied - like a response with person created and created as part of request_id=2 This will ensure your originating system knows what was the last request that was executed as the request id is stored in db. Second thing, if your originating point finds that last request was still at 1 not yet 2, then it may try again with 3, to make sure if by any chance just the GET response has gotten lost but the request 2 was created in the db. You can introduce number of tries for your GET request and time to wait before firing again a GET etc kinds of system.

Denature answered 25/4, 2022 at 13:8 Comment(2)
Oh shoot I dint notice am replying to a 2012 threadDenature
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Cancel

© 2022 - 2024 — McMap. All rights reserved.