ETags and collections

Asked 14/2, 2015 at 18:10 Answered 14/4 at 16:31

Many REST APIs provide the ability to search for resources.

For example, resources of type A may be fetched using the following HTTP request:

GET /A?prop1={value1}&prop2={value2}

I'm using optimistic locking and therefore would like to return a version for every returned resource of type A. Until now, I used the ETag header when fetching only one resource using its ID.

Is there an HTTP way for returning version for multiple resources in the same response? If not, should I include the versions in the body?

Thanks, Mickael

EDIT: I found on the web that the ETag is often generated by computing a hash of part of the reply. This approach fits well with my case since a hash of the returned collection will be computed. However, if the client decides to update one of the elements in the collection, which ETag should he put in the If-Match header? I'm thinking that including the ETags of the individual elements is the only solution...

Shinn answered 14/2, 2015 at 18:10 Comment(3)

What are "multiple resources" in your API? – Vestavestal 14/2, 2015 at 19:21

A collection of resources of type A. However, this collection is not a resource in itself. It contains resources which are independent of each other. Therefore, each of these resources has its own version. – Shinn 14/2, 2015 at 19:24

If my understanding is correct , in case of multiple resources , your response won't have ETag , instead the version of each resource will be part of response body , and the HTTP PUT request for each resource will include version info in "if-modified-since" header. right ? – Sorrel 26/10, 2015 at 12:25

I would adopt one of these options:

Make ETags weak by default and they are generated with the resource current state, not with the resource representation in the HTTP response payload. With that, I can return a valid ETag for each resource in the collection query response body, besides the ETag for the whole collection in the response header.
Forget If-Match and ETags for this case and use If-Unmodified-Since with a Last-Modified supplied as a property of each resource. By doing that I can preserve the strong ETags, but clients can still make updates to one item based on the collection response without the need for another request to the resource itself.
Allow updates via PATCH on the collection resource itself, using the If-Match header with the ETag for the whole collection. This probably won't work very well if there's a lot of concurrent changes, but it's a reasonable approach.

Grantor answered 26/3, 2015 at 21:57 Comment(0)

I think it depends a little bit on the amount of resources, data and requests to reduce bandwith. But a solution could be to separate the resources in sub-requests.

Assume that the group call of GET /images?car=mustang&viewangle=front returns 5 images. Now you could include all images as binary data and the GET-request itself has a unique ETag:

GET /images?car=mustang&viewangle=front
...
HTTP 1.1 200 OK
ETag "aaaaaa"

data:image/png;base64,a123456....
data:image/png;base64,b123456....
data:image/png;base64,c123456....
data:image/png;base64,d123456....
data:image/png;base64,e123456....

The problem is now, that one added image changes the ETag of the group call and you need to transfer the complete set again altough only one image has changed:

GET /images?car=mustang&viewangle=front
If-None-Match "aaaaaa"
...
HTTP 1.1 200 OK
ETag "bbbbbb"

data:image/png;base64,a123456....
data:image/png;base64,b123456....
data:image/png;base64,c123456....
data:image/png;base64,d123456....
data:image/png;base64,e123456....
data:image/png;base64,f123456....

In this case the best solution would be that you separate the resources data from the group call. So the response includes only information for sub-requests:

GET /images?car=mustang&viewangle=front
...
HTTP 1.1 200 OK
ETag "aaaaaa"

a.jpg
b.jpg
c.jpg
d.jpg
e.jpg

By that every sub-request can be cached separatly:

GET /image/?src=a.jpg
If-None-Match "Akj5odjr"
...
HTTP 1.1 304 Not Modified

Statistics
- First request = 6x 200 OK
- Future requests if group unchanged = 1x 304 Not Modified
- Future requests if one new resource has been added = 2x 200 OK, 5x 304 Not Modified

Now I would tune the API documentation. This means the requester must check if a cache of a sub-request is available before making a call to it. This could be done by providing the ETags (or other hash) in the group request:

GET /images?car=mustang&viewangle=front
...
HTTP 1.1 200 OK
...
ETag "aaaaaa"

a.jpg;AfewrKJD
b.jpg;Bgnweidk
c.jpg;Ckirewof
d.jpg;Dt34gsd0
e.jpg;Egk29dds
f.jpg;F498wdn4

Now the requester checks the cache and finds out that a.jpg has a new ETag called Akj5odjr and f.jpg;F498wdn4 is a new entry. By that future requests are reduced:

Statistics
- First request = 6x 200 OK
- Future requests if group unchanged = 1x 304 Not Modified
- Future requests if one new resource has been added = 2x 200 OK

Conclusion
Finally you need to think about if your resources are big enough to put them in sub-requests and how often one requester repeats bis group request (so the cache is used). If not, you should include them in the group call and you do not have room for optimization.

P.S. you need to monitor all requesters to be sure all of them use caches. A possible solution would be to ban requesters calling an API URL two or more times without sending an ETag.

Fluid answered 2/4, 2015 at 19:54 Comment(0)

I've just encountered this limitation of ETag headers when trying to implement them for collection responses for my REST API.

I decided to instead maintain a mapping of resource id's to generated etags, which appears to work well, albeit at the cost of complexity and not without some limitations.

When generating the repsonse, after performing the collection search, I do the following:

Record a unique reference of each entity that appears within the collection (I'm using PK from the database).
Create the response and generate a etag hash from the response body.
Add a mapping of the etag hash -> entity id in to the cache. This means that Each etag key has an array of entity ids associated to it.
Generate a key that represents the request that was made (so that the same key is generated for the same request params). Store a cache entry of this key with a value of the etag.
Add etag header to response and return.

When a new GET request is made with If-None-Match.

Regenerate the request cache key and check the cache for a matching value (an etag).
Also check the cached etag -> entity map for the etag hash.
If a match is found for both, return a 304 Not Modified response.

When a PATCH/PUT/DELETE request is made to a single resource.

Search the cached "map" for any etags that have a entity id matching the one that was just modified and delete the entry.

One limitation of this is with POST requests. As new created records could also appear in existing search results, all etags for collections would need to be invalidated. I decided that to improve cache hits I can add a resource namespace to my etag -> entity id map so that I only need to invalidate the mapping for a specific resource type.

Fastigium answered 14/4 at 16:31 Comment(0)

Recommended topics

Hot tags