Answering in two parts here:
- How is Async servlet more beneficial than thread per request model that we had earlier: This model is long dead, and almost all Java servers use NIO, which allows these servers to handle hundreds of connections using a handful of threads. You might verify this for your app server as well, and you will be pleasantly surprised to see that it does use NIO :). The Async Servlets do not have anything to do with the one request per thread.
- Then why Async Servlets: Well, Async Servlets allow for the original request to be completed, without waiting for the async task completion(which hopefully is long running). Thus the remote client can be responded immediately, and they can do some other stuff if they have to. The async task can be processed by the server later in one of the threads. These Async Operations are typically done in a separate thread pool, meant for Async operations. The thread pool meant for handling client connections is not used for Async operations.
Update: Some more details on why we need Async Servlets if we were already using NIO thread pool
I did jot down my notes sometime ago on non-blocking IO at http://manish-m.com/?p=996 . You might also view a related post at http://manish-m.com/?p=915 (particularly the IO Playground section on this page).
The NIO thread pool is for handling multiple connection requests. It uses non-blocking IO feature of kernels, so that a small number of threads can work with many connections.
However, the same threads that read data from the network buffer also executes the "user code" (that we write within servlets). The servlet container's framework for NIO handles accepting client request, but it cannot handle a "blocking user code" on its own, that is written by us. Thus, if we write a DB Query that takes say 10 seconds, then the containers framework cannot handle it asynchronously by itself. We would block the original NIO thread pool by writing any blocking code in servlets. Hence, we need to explicitly write anything, that we think can potentially block the request threads of container, as a Async servlet in Java EE.
Similarly, when we use other NIO frameworks like Netty, MINA, then we need to take care of ensuring that the code "does not" block the NIO threads that handle the network connections. This is usually achieved by off-loading such long running tasks to another thread pool (which is what the container does when you write an async servlet).