Long running REST API with queues
Asked Answered
R

1

76

We are implementing a REST API, which will kick off multiple long running backend tasks. I have been reading the RESTful Web Services Cookbook and the recommendation is to return HTTP 202 / Accepted with a Content-Location header pointing to the task being processed. (e.g. http://www.example.org/orders/tasks/1234), and have the client poll this URI for an update on the long running task.

The idea is to have the REST API immediately post a message to a queue, with a background worker role picking up the message from the queue and spinning up multiple backend tasks, also using queues. The problem I see with this approach is how to assign a unique ID to the task and subsequently let the client request a status of the task by issuing a GET to the Content-Location URI.

If the REST API immediately posts to a queue, then it could generate a GUID and attach that as an attribute on the message being added to the queue, but fetching the status of the request becomes awkward.

Another option would be to have the REST API immediately add an entry to the database (let's say an order, with a new order id), with an initial status and then subsequently put a message on the queue to kick off the back ground tasks, which would then subsequently update that database record. The API would return this new order ID in the URI of the Content-Location header, for the client to use when checking the status of the task.

Somehow adding the database entry first, then adding the message to the queue seems backwards, but only adding the request to the queue makes it hard to track progress.

What would be the recommended approach?

Thanks a lot for your insights.

Ret answered 8/10, 2015 at 7:49 Comment(0)
B
99

I assume your system looks like the following. You have a REST service, which receives requests from the client. It converts the requests into commands which the business logic can understand. You put these commands into a queue. You have a single or multiple workers which can process and remove these commands from the queue and send the results to the REST service, which can respond to the client.

Your problem that by your long running tasks the client connection timeouts, so you cannot send a response. So what you can do is sending a 202 accepted after you put the commands into the queue and add a polling link, so the client will be able to poll for the changes. Your tasks have multiple subtasks so there is progress, not just pending and complete status changes.

  1. If you want to stick with polling, you should create a new REST resource, which contains the actual state and the progress of the long running task. This means that you have to store this info in a database, so the REST service will be able to respond to requests like GET /tasks/23461/status. This means that your worker has to update the database when it is completed a subtask or the whole task.
  2. If your REST service is running as a daemon, then you can notify it by progress, so storing the task status in the database won't be the responsibility of the worker. This kind of REST service can store the info in the memory as well.
  3. If you decide to use websockets to notify the client, then you can create a notification service. By REST you have to respond with a task id. After that you send back this task id on the websocket connection, so the notification service will know which websocket connection subscribed to the events of a certain task. After that you won't need the REST service, you can send the progress through the websocket connection as long as the client does not close the connection.
  4. You can combine these solutions the following way. You let your REST service to create a task resource, so you'll be able to access the progress by using a polling link. After that you send back an identifier with 202 which you send back through the websockets connection. So you can use a notification service to notify the client. By progress your worker will notify the REST service, which will create a link like GET /tasks/23461/status and send that link to the client through the notification service. After that the client can use the link to update its status.

I think the last one is the best solution if your REST service runs as a daemon. It is because you can move the notification responsibility to a dedicated notification service, which can use websockets, polling, SSE, whatever you want. It can collapse without killing the REST service, so the REST service will stay stable and fast. If you send back a manual update link too with the 202, then the client can do manual update (assuming a human controlled client), so you will have something like graceful degradation if the notification service is not available. You don't have to maintain the notification service because it won't know anything about the tasks, it will just send data to the clients. Your worker won't have to know anything about how to send notifications and how to create hyperlinks. It will be easier to maintain the client code too, since it will be almost a pure REST client. The only extra feature will be the subscription for the notification links, which does not change frequently.

Burlburlap answered 8/10, 2015 at 9:37 Comment(12)
Thanks a lot for your insights. I'm all for pragmatic solutions, so seems like the single database approach will suffice. The task model can be extended to include sub tasks as well, should the API support aggregated request. Thanks!Ret
Thanks for the update and suggestion! :) This setup would be a viable solution for sure. Our integrators may not be able to keep a connection open (legacy systems), but most likely be able to do a poll. Bridging a REST API and the 202 / Accepted approach with a service bus / queue is where the disconnect is for me. Ideally the service bus / queue would be the integration point for other systems and the API in this case, is just another way of integrating another system with the service bus (Channel Adapter). For your suggest setup to work, we would need another abstraction layer which dealsRet
the update_event update as i see it. So the API would: 1. insert a new row in the update_event table and return the new id. 2: immediately insert a new request in the service bus queue. Client can now poll etc. Worker role 1: picks up queue message and dispatches backend workflow. 2: once done, it will have to update the update_event table with the new ressource URI etc. The problem here, is that the worker role will become aware of a client with special requirements, or? Meaning, it's no longer generic to the messaging system?Ret
@Ret Can you elaborate this " worker role will become aware of a client with special requirements" part?Burlburlap
@Ret I think if you want to use any server push technology with REST, it is better to move this to a separate service, which is responsible only for delivering the polling or update links to the client. So if you push the polling link to the client, then it has to call it only once. With this solution the REST service won't be dependent on the push service and so it will be much more stable and won't have decreased scalability because of the integrated push service. Maintaining the client will be easier too, because it only needs to receive links and follow them.Burlburlap
The worker role would have to know of the update_event table, but thinking about it, perhaps it's not just related to the REST API and a core feature of the systemRet
@Ret I'll edit my answer, it would be too long in comments.Burlburlap
Thanks for taking the time to help out. I found an interesting article with a similar solution. I think the biggest aha moment for me in this article, is treating the "task" as a resource itself instead of a request going into a queue, essentially promoting the "task" to a full fledged resource in the context of the REST API. Using a table for the "task" resources means more work to have worker roles pick up tasks from the table and supporting "at most once" logic when scaling workers. Article in mention: billhiggins.us/blog/2011/04/27/resty-long-opsRet
@Ret Yes, that is an option. I read the question again and rewrote the answer. I think I misunderstood the question by sending the first answer.Burlburlap
Great answer! I was wondering if it would be a viable approach if the client knows the response will be slow, then it sends a url to the server where the server can post the result back to. That eliminates the need for polling, but then there's no status updates. Any other drawback with this approach?Birt
@Birt REST endpoint may also respond with estimated time which "guesses" when the issuing task can be done, after that (e.g. it might require more time to get task done in practice) you still needs to poll for progress update at acceptable time interval .Automobile
For (2) , it depends on requirements, if the long-running task (issued by your REST endpoint) takes 10-20 hours and your application will have high traffic later in a bit , you might not want to store the progress info in memory, or relational database if there will be lots of CRUDs , in such case you may consider NoSQL or file storage for keeping progress info, due to the fact the progress info is almost never updated once inserted to database.Automobile

© 2022 - 2024 — McMap. All rights reserved.