How does event-driven programming help a webserver that only does IO?
Asked Answered
N

2

5

I'm considering a few frameworks/programming methods for our new backend project. It regards a BackendForFrontend implementation, which aggregates downstream services. For simplicity, these are the steps it goes trough:

  1. Request comes into the webserver
  2. Webserver makes downstream request
  3. Downstream request returns result
  4. Webserver returns request

How is event-driven programming better than "regular" thread-per-request handling? Some websites try to explain, and it often comes down to something like this:

The second solution is a non-blocking call. Instead of waiting for the answer, the caller continues execution, but provides a callback that will be executed once data arrives.

What I don't understand: we need a thread/handler to await this data, right? Its nice that the event handler can continue, but we still need (in this example) a thread/handler per request that awaits each downstream request, right?

Consider this example: the downstream requests take n seconds to return. In this n seconds, r requests come in. In the thread-per-request we need r threads: one for every request. After n seconds pass, the first thread is done processing and available for a new request.

When implementing a event-driven design, we need r+1 threads: an event loop and r handlers. Each handler takes a request, performs it, and calls the callback once done.

So how does this improve things?

Nobell answered 3/4, 2018 at 14:43 Comment(6)
What I just thought of: if we have an handler that can handle multiple request on the same thread the benefit is clear: we only need one (or a few for redundancy) handlers thus greatly reducing the amount of threads needed. But this isn't really possible, right?Nobell
Why not? The "handler" is just a data structure that remembers the state for each connection (socket number, address, cookies, whatever) upstream and down and maintains the association between them, plus some code that knows how to process the next input, in effect, "turning the crank" on your state machine. Each connection has its own separate data structure / current state. There is no hard limit on how many requests can be handled within a single thread.Roxanaroxane
Can you elaborate on this? How is such an implementation called? I tried researching this but cannot find anything, but perhaps my keywords ("Multiple HTTP requests on same thread") are not rightNobell
As far as I understand, you always need a thread pool for doing parallel request since you have to always perform the request in a synchronous way. Yes you can build an async API around this, but actually performing the request requires a waiting thread (again, as far as I understand)Nobell
No. A socket receive (when data is already available) is simply copying bytes from kernel buffer into user space buffer. A send is the reverse: copying from user space into kernel (assuming there is space in the kernel buffers). So, those operations are "synchronous" in the sense that nothing else is happening in your thread while they are ongoing but they complete almost instantaneously. select, poll and similar mechanisms allow you to wait for any one of multiple sockets to become "readable" or "writable".Roxanaroxane
Try these keywords: "async single thread server"Roxanaroxane
B
7

What I don't understand: we need a thread/handler to await this data, right?

Not really. The idea behind NIO is that no threads ever get blocked.

It is interesting because the operating system already works in a non-blocking way. It is our programming languages that were modeled in a blocking manner.

As an example, imagine that you had a computer with a single CPU. Any I/O operation that you do will be orders of magnitude slower than the CPU, right?. Say you want to read a file. Do you think the CPU will stay there, idle, doing nothing while the disk head goes and fetches a few bytes and puts them in the disk buffer? Obviously not. The operating system will register an interruption (i.e. a callback) and will use the valuable CPU for something else in the mean time. When the disk head has managed to read a few bytes and made them available to be consumed, an interruption will be triggered and the OS will then give attention to it, restore the previous process block and allocate some CPU time to handle the available data.

So, in this case, the CPU is like a thread in your application. It is never blocked. It is always doing some CPU-bound stuff.

The idea behind NIO programming is the same. In the case you're exposing, imagine that your HTTP server has a single thread. When you receive a request from your client you need to make an upstream request (which represents I/O). So what a NIO framework would do here is to issue the request and register a callback for when the response is available.

Immediately after that your valuable single thread is released to attend yet another request, which is going to register another callback, and so on, and so on.

When the callback resolves, it will be automatically scheduled to be processed by your single thread.

As such, that thread works as an event loop, one in which you're supposed to schedule only CPU bound stuff. Every time you need to do I/O, that's done in a non-blocking way and when that I/O is complete, some CPU-bound callback is put into the event loop to deal with the response.

This is a powerful concept, because with a very small amount threads you can process thousands of requests and therefore you can scale more easily. Do more with less.

This feature is one of the major selling points of Node.js and the reason why even using a single thread it can be used to develop backend applications.

Likewise this is the reason for the proliferation of frameworks like Netty, RxJava, Reactive Streams Initiative and the Project Reactor. They all are seeking to promote this type of optimization and programming model.

There is also an interesting movement of new frameworks that leverage this powerful features and are trying to compete or complement one another. I'm talking of interesting projects like Vert.x and Ratpack. And I'm pretty sure there are many more out there for other languages.

Bignoniaceous answered 5/4, 2018 at 17:43 Comment(0)
A
1

The whole idea of non-blocking paradigm is achieved by this idea called "Event Loop"

Interesting references:

  1. http://www.masterraghu.com/subjects/np/introduction/unix_network_programming_v1.3/ch06lev1sec2.html
  2. Understanding the Event Loop
  3. https://www.youtube.com/watch?v=8aGhZQkoFbQ
Atp answered 7/3, 2019 at 8:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.