REST API - Bulk Create or Update in single request [closed]
Asked Answered
C

4

123

Let's assume there are two resources Binder and Doc with association relationship meaning that the Doc and Binder stand on their own. Doc might or might not belong to Binder and Binder might be empty.

If I want to design a REST API that allows a user to send a collection of Docs, IN A SINGLE REQUEST, like the following:

{
  "docs": [
    {"doc_number": 1, "binder": 1}, 
    {"doc_number": 5, "binder": 8},
    {"doc_number": 6, "binder": 3}
  ]
}

And for each doc in the docs,

  • If the doc exists then assign it to Binder
  • If the doc doesn't exist, create it and then assign it

I'm really confused as to how this should be implemented:

  • What HTTP method to use?
  • What response code must be returned?
  • Is this even qualified for REST?
  • How would the URI look like? /binders/docs?
  • Handling bulk request, what if a few items raise an error but the other go through. What response code must be returned? Should the bulk operation be atomic?
Chantalchantalle answered 19/2, 2015 at 0:33 Comment(1)
Unfortunately, as discussed here, HTTP isn't ideal in case of batch processing. HTTP at its core is just a remote document management protocol with a focus on single documents. We might send a single "document" to a server and treat it like its affecting multiple documents, though you'd effectively bypass any (intermediary) cache in that process as you don't target those resources respectivelyPendulum
E
78

I think that you could use a POST or PATCH method to handle this since they typically design for this.

  • Using a POST method is typically used to add an element when used on list resource but you can also support several actions for this method. See this answer: Update an entire resource collection in a REST way. You can also support different representation formats for the input (if they correspond to an array or a single elements).

    In the case, it's not necessary to define your format to describe the update.

  • Using a PATCH method is also suitable since corresponding requests correspond to a partial update. According to RFC5789 (https://www.rfc-editor.org/rfc/rfc5789):

    Several applications extending the Hypertext Transfer Protocol (HTTP) require a feature to do partial resource modification. The existing HTTP PUT method only allows a complete replacement of a document. This proposal adds a new HTTP method, PATCH, to modify an existing HTTP resource.

    In the case, you have to define your format to describe the partial update.

I think that in this case, POST and PATCH are quite similar since you don't really need to describe the operation to do for each element. I would say that it depends on the format of the representation to send.

The case of PUT is a bit less clear. In fact, when using a method PUT, you should provide the whole list. As a matter of fact, the provided representation in the request will be in replacement of the list resource one.

You can have two options regarding the resource paths.

  • Using the resource path for doc list

In this case, you need to explicitely provide the link of docs with a binder in the representation you provide in the request.

Here is a sample route for this /docs.

The content of such approach could be for method POST:

[
    { "doc_number": 1, "binder": 4, (other fields in the case of creation) },
    { "doc_number": 2, "binder": 4, (other fields in the case of creation) },
    { "doc_number": 3, "binder": 5, (other fields in the case of creation) },
    (...)
]
  • Using sub resource path of binder element

In addition you could also consider to leverage sub routes to describe the link between docs and binders. The hints regarding the association between a doc and a binder doesn't have now to be specified within the request content.

Here is a sample route for this /binder/{binderId}/docs. In this case, sending a list of docs with a method POST or PATCH will attach docs to the binder with identifier binderId after having created the doc if it doesn't exist.

The content of such approach could be for method POST:

[
    { "doc_number": 1, (other fields in the case of creation) },
    { "doc_number": 2, (other fields in the case of creation) },
    { "doc_number": 3, (other fields in the case of creation) },
    (...)
]

Regarding the response, it's up to you to define the level of response and the errors to return. I see two levels: the status level (global level) and the payload level (thinner level). It's also up to you to define if all the inserts / updates corresponding to your request must be atomic or not.

  • Atomic

In this case, you can leverage the HTTP status. If everything goes well, you get a status 200. If not, another status like 400 if the provided data aren't correct (for example binder id not valid) or something else.

  • Non atomic

In this case, a status 200 will be returned and it's up to the response representation to describe what was done and where errors eventually occur. ElasticSearch has an endpoint in its REST API for bulk update. This could give you some ideas at this level: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/bulk.html.

  • Asynchronous

You can also implement an asynchronous processing to handle the provided data. In this case, the HTTP status returns will be 202. The client needs to pull an additional resource to see what happens.

Before finishing, I also would want to notice that the OData specification addresses the issue regarding relations between entities with the feature named navigation links. Perhaps could you have a look at this ;-)

The following link can also help you: https://templth.wordpress.com/2014/12/15/designing-a-web-api/.

Hope it helps you, Thierry

Epicureanism answered 25/2, 2015 at 13:58 Comment(1)
I have follow on question. I opted for flat routes without a nested sub resource. To get all docs I call GET /docs and retrieve all docs within a particular binder, GET /docs?binder_id=x. To delete a subset of the resources would I call DELETE /docs?binder_id=x or should I call DELETE /docs with a {"binder_id": x} in the request body? Would you ever use PATCH /docs?binder_id=x for a batch update, or just PATCH /docs and pass pairs?Pistoia
C
44

You probably will need to use POST or PATCH, because it is unlikely that a single request that updates and creates multiple resources will be idempotent.

Doing PATCH /docs is definitely a valid option. You might find using the standard patch formats tricky for your particular scenario. Not sure about this.

You could use 200. You could also use 207 - Multi Status

This can be done in a RESTful way. The key, in my opinion, is to have some resource that is designed to accept a set of documents to update/create.

If you use the PATCH method I would think your operation should be atomic. i.e. I wouldn't use the 207 status code and then report successes and failures in the response body. If you use the POST operation then the 207 approach is viable. You will have to design your own response body for communicating which operations succeeded and which failed. I'm not aware of a standardized one.

Croft answered 24/2, 2015 at 20:22 Comment(3)
Thank you so much. By This can be done in a RESTful way do you mean the Update and Create must be done separately?Chantalchantalle
@norbertpy Performing some kind of write operation on a resource can cause other resources to be updated and created from a single request. REST has no issue with that. My choice of phrase was because some frameworks implement bulk operations by serializing HTTP requests into multi-part documents and then sending the serialized HTTP requests as a batch. I think that approach violates the resource identification REST constraint.Croft
207 is a status code defined by WebDAV. A generic HTTP client might not be aware of that status code unfortunately. Moreover, as caching is one of the few constraints REST has, any of the presented "workarounds" would effectively bypass caching completely. HTTP at its core is designed around single document exchanges and less about batch processing unfortunatelyPendulum
P
22

PUT ing

PUT /binders/{id}/docs Create or update, and relate a single document to a binder

e.g.:

PUT /binders/1/docs HTTP/1.1
{
  "docNumber" : 1
}

PATCH ing

PATCH /docs Create docs if they do not exist and relate them to binders

e.g.:

PATCH /docs HTTP/1.1
[
    { "op" : "add", "path" : "/binder/1/docs", "value" : { "doc_number" : 1 } },
    { "op" : "add", "path" : "/binder/8/docs", "value" : { "doc_number" : 8 } },
    { "op" : "add", "path" : "/binder/3/docs", "value" : { "doc_number" : 6 } }
] 

I'll include additional insights later, but in the meantime if you want to, have a look at RFC 5789, RFC 6902 and William Durand's Please. Don't Patch Like an Idiot blog entry.

Peacemaker answered 21/2, 2015 at 19:40 Comment(1)
Sometime client needs bulk operation and it doesn't want to care whether the resource is there or not. As I said in the question, the client wants to send a bunch of docs and associate them with binders. The client wants to create binders if they don't exist and make the association if they do. In ONE SINGLE BULK request.Chantalchantalle
B
19

In a project I worked at we solved this problem by implement something we called 'Batch' requests. We defined a path /batch where we accepted json in the following format:

[  
   {
      path: '/docs',
      method: 'post',
      body: {
         doc_number: 1,
         binder: 1
      }
   },
   {
      path: '/docs',
      method: 'post',
      body: {
         doc_number: 5,
         binder: 8
      }
   },
   {
      path: '/docs',
      method: 'post',
      body: {
         doc_number: 6,
         binder: 3
      }
   },
]

The response have the status code 207 (Multi-Status) and looks like this:

[  
   {
      path: '/docs',
      method: 'post',
      body: {
         doc_number: 1,
         binder: 1
      }
      status: 200
   },
   {
      path: '/docs',
      method: 'post',
      body: {
         error: {
            msg: 'A document with doc_number 5 already exists'
            ...
         }
      },
      status: 409
   },
   {
      path: '/docs',
      method: 'post',
      body: {
         doc_number: 6,
         binder: 3
      },
      status: 200
   },
]

You could also add support for headers in this structure. We implemented something that proved useful which was variables to use between requests in a batch, meaning we can use the response from one request as input to another.

Facebook and Google have similar implementations:
https://developers.google.com/gmail/api/guides/batch
https://developers.facebook.com/docs/graph-api/making-multiple-requests

When you want to create or update a resource with the same call I would use either POST or PUT depending on the case. If the document already exist, do you want the entire document to be:

  1. Replaced by the document you send in (i.e. missing properties in request will be removed and already existing overwritten)?
  2. Merged with the document you send in (i.e. missing properties in request will not be removed and already existing properties will be overwritten)?

In case you want the behavior from alternative 1 you should use a POST PUT and in case you want the behaviour from alternative 2 you should use PUT PATCH.

http://restcookbook.com/HTTP%20Methods/put-vs-post/

Breast answered 16/5, 2017 at 16:35 Comment(4)
Like this answer for the Proof-of-Concept as well as the Google and Facebook links. But disagree with the ending part about POST or PUT. In the 2 cases this answer mentioned, the first one should be PUT, and the second should be PATCH.Nabala
@RayLuo, can you explain why we need PATCH in addition to POST and PUT?Breast
Because that's what the PATCH was invented for. You can read this definition and see how the PUT and PATCH match your 2 bulletpoints.Nabala
@DavidBerg, It seems that Google has preferred another approach to process batch requests, i.e., separate the header and body of each sub request to the corresponding part of a main request, with a boundary like --batch_xxxx. Is there some crucial differences between the solutions of Google and Facebook? Addtionally, about "use the response from one request as input to another", it sounds very interesting, would you mind sharing more details? or which kind of scenario should be used?Serous

© 2022 - 2024 — McMap. All rights reserved.