non-blocking IO vs async IO and implementation in Java
Asked Answered
I

5

91

Trying to summarize for myself the difference between these 2 concepts (because I'm really confused when I see people are using both of them in one sentence, like "non-blocking async IO" which I'm trying to figure out what does it mean).

So, in my understanding non-blocking IO is primary the OS mechanism to process the IO if there is any data ready, otherwise just return error/do nothing.

In async IO you just provide a callback, and your application will be notified when the data is available.

So what is actually "non-blocking async IO"? And how all them can be implemented in Java (standard JDK, without external libs, I know there are java.nio.channels.{Channels, Selector, SelectorKey} and java.nio.channels.{AsynchronousSocketChannel}): non-blocking IO, async IO, and non-blocking async IO (if there is such thing)?

Ineludible answered 2/8, 2014 at 21:37 Comment(3)
'Non-blocking async I/O' is just pointless double-talk. I don't understand why you think external libraries would be required. They are all ultimately just wrappers over operating system facilities.Capuchin
You understand the terms correctly. As noted, "non-blocking async IO" would be redundant. If the underlying I/O mechanism is non-blocking, it doesn't need to be async, and vice-versa. Maybe whoever described it that way means it's non-blocking because it's been made async. (Example: the android-async-http library is an async wrapper around synchronous socket I/O.)Tracay
@KevinKrumwiede could you provide an example where async-io is actually blocking (the only thing I can imagine that the callback and the main process share the same thread, and there is a wait/future.get() in the callback or similar).Ineludible
R
63

So what is actually "non-blocking async IO"?

To answer that, you must first understand that there's no such thing as blocking async I/O. The very concept of asynchronism dictates that there's no waiting, no blocking, no delay. When you see non-blocking asynchronous I/O, the non-blocking bit only serves to further qualify the async adjective in that term. So effectively, non-blocking async I/O might be a bit of a redundancy.

There are mainly two kinds of I/O. Synchronous and Asynchronous. Synchronous blocks the current thread of execution until processing is complete, while Asynchronous doesn't block the current thread of execution, rather passing control to the OS Kernel for further processing. The kernel then advises the async thread when the submitted task is complete


Asynchronous Channel Groups

The concept of Async Channels in java is backed by Asynchronous Channel Groups. An async channel group basically pools a number of channels for reuse. Consumers of the async api retrieve a channel from the group (the JVM creates one by default) and the channel automatically puts itself back into the group after it's completed its read/write operation. Ultimately, Async Channel Groups are backed by surprise, threadpools. Also, Asynchronous channels are threadsafe.

The size of the threadpool that backs an async channel group is configured by the following JVM property

java.nio.channels.DefaultThreadPool.initialSize

which, given an integer value will setup a threadpool of that size, to back the channel group. The channel group is created and maintained transparently to the developer otherwise.


And how all them can be implemented in Java

Well, I'm glad you asked. Here's an example of an AsynchronousSocketChannel (used to open a non-blocking client Socket to a listening server.) This sample is an excerpt from Apress Pro Java NIO.2, commented by me:

//Create an Asynchronous channel. No connection has actually been established yet
AsynchronousSocketChannel asynchronousSocketChannel = AsynchronousSocketChannel.open(); 

/**Connect to an actual server on the given port and address. 
   The operation returns a type of Future, the basis of the all 
   asynchronous operations in java. In this case, a Void is 
   returned because nothing is returned after a successful socket connection
  */
Void connect = asynchronousSocketChannel.connect(new InetSocketAddress("127.0.0.1", 5000)).get();


//Allocate data structures to use to communicate over the wire
ByteBuffer helloBuffer = ByteBuffer.wrap("Hello !".getBytes()); 

//Send the message

Future<Integer> successfullyWritten=  asynchronousSocketChannel.write(helloBuffer);

//Do some stuff here. The point here is that asynchronousSocketChannel.write() 
//returns almost immediately, not waiting to actually finish writing 
//the hello to the channel before returning control to the currently executing thread

doSomethingElse();

//now you can come back and check if it was all written (or not)

System.out.println("Bytes written "+successfullyWritten.get());

EDIT: I should mention that support for Async NIO came in JDK 1.7

Roee answered 2/8, 2014 at 23:43 Comment(9)
There are three kinds: blocking, non-blocking, and asynchronous. You've missed the point of the question about how they can be implements in Java with it external libraries.Capuchin
@EJP - There is internal support for Async I/O in Java without external libraries, I have that in my answer. On the matter of blocking/non-blocking, is there blocking async I/O? If you have samples, I'm happy to update my answerRoee
Async I/O is generally async because the I/O mechanism is blocking. In this context, asynchronous simply means it's done in another thread.Tracay
So, @KevinKrumwiede does it mean according to your definition, that every I/O is blocking, the question is at what point of time/thread do we block, correct? Than we should talk only about sync/async io and do not mention blocking/non-blocking, because it blocks all the time (maybe not immediately, like future.get() - wait for the result), or from async thread (we block the async thread execution at some point).Ineludible
I suppose all I/O is blocking on some level, in the hardware if not in the software. Whether you call it blocking depends on what API is presented to you, i.e., whether it blocks your threads. If the I/O is non-blocking outside the API, it's because it's been made asynchronous on some level inside the API. That's why it's redundant to say "non-blocking async I/O." Non-blocking and async imply each other.Tracay
@kolossus, your answer is very apt. But, could it be said of the Servlet 3.1 implementation then? Servlet 3.0 came out with async servlets while using traditional IO and Servlet 3.1 came up with Non Blocking IO. So, if an async servlet uses non blocking IO, then won't it be called async non blocking?Abattoir
if successfullyWritten has not finished yet ,will successfullyWritten.get() blocks current thread? If so , this kind of usage seems ugly.Kilmer
Actually you can have async blocking IO. As long as any one OS thread is suspended in a wait state due to a blocking system call, then there is blocking IO. Java's "NIO" is blocking async IO because although the thread initiating the IO does so async (putting task into NIO queue so it can continue without waiting), the actual system call executed by the thread in the NIO pool is a blocking system call. Java should really start using the nonblocking system calls provided by the OS, then it Java NIO wouldn't need any thread pool. In true NIO, there is no OS thread, just an IO interrupt handler.Turgescent
@AjaxLeung - One reason Java may not use the native async calls provided by the OS, because on some platforms, contrary to the strident assertions above, async I/O does, in fact, block: lwn.net/Articles/724198 And there is no way Java could just use an interrupt handler - because those are typically not run in user space, nor do they pass data back & forth to user space. In fact, under the hood, sometimes it's not an interrupt handler at all: lwn.net/Articles/663879Shephard
M
143

I see this is an old question, but I think something was missed here, that @nickdu attempted to point out but wasn't quite clear.

There are four types of IO pertinent to this discussion:

Blocking IO

Non-Blocking IO

Asynchronous IO

Asynchronous Non-Blocking IO

The confusion arises I think because of ambiguous definitions. So let me attempt to clarify that.

First Let's talk about IO. When we have slow IO this is most apparent, but IO operations can either be blocking or non-blocking. This has nothing to do with threads, it has to do with the interface to the operating system. When I ask the OS for an IO operation I have the choice of waiting for all the data to be ready (blocking), or getting what is available right now and moving on (non-blocking). The default is blocking IO. It is much easier to write code using blocking IO as the path is much clearer. However, your code has to stop and wait for IO to complete. Non-Blocking IO requires interfacing with the IO libraries at a lower level, using select and read/write instead of the higher level libraries that provide convenient operations. Non-Blocking IO also implies that you have something you need to work on while the OS works on doing the IO. This might be multiple IO operations or computation on the IO that has completed.

Blocking IO - The application waits for the OS to gather all the bytes to complete the operation or reach the end before continuing. This is default. To be more clear for the very technical, the system call that initiates the IO will install a signal handler waiting for a processor interrupt that will occur when the IO operation makes progress. Then the system call will begin a sleep which suspends operation of the current process for a period of time, or until the process interrupt occurs.

Non-Blocking IO - The application tells the OS it only wants what bytes are available right now, and moves on while the OS concurrently gathers more bytes. The code uses select to determine what IO operations have bytes available. In this case the system call will again install a signal handler, but rather than sleep, it will associate the signal handler with the file handle, and immediately return. The process will become responsible for periodically checking the file handle for the interrupt flag having been set. This is usually done with a select call.

Now Asynchronous is where the confusion begins. The general concept of asynchronous only implies that the process continues while the background operation is performed, the mechanism by which this occurs is not specific. The term is ambiguous as both non-blocking IO and threaded blocking IO can be considered to be asynchronous. Both allow concurrent operations, however the resource requirements are different, and the code is substantially different. Because you have asked a question "What is Non-Blocking Asynchronous IO", I am going to use a stricter definition for asynchronous, a threaded system performing IO which may or may not be non-blocking.

The general definition

Asynchronous IO - Programmatic IO which allows multiple concurrent IO operations to occur. IO operations are happening simultaneously, so that code is not waiting for data that is not ready.

The stricter definition

Asynchronous IO - Programmatic IO which uses threading or multiprocessing to allow concurrent IO operations to occur.

Now with those clearer definitions we have the following four types of IO paradigms.

Blocking IO - Standard single threaded IO in which the application waits for all IO operations to complete before moving on. Easy to code, no concurrency and so slow for applications that require multiple IO operations. The process or thread will sleep while waiting for the IO interrupt to occur.

Asynchronous IO - Threaded IO in which the application uses threads of execution to perform Blocking IO operations concurrently. Requires thread safe code, but is generally easier to read and write than the alternative. Gains the overhead of multiple threads, but has clear execution paths. May require the use of synchronized methods and containers.

Non-Blocking IO - Single threaded IO in which the application uses select to determine which IO operations are ready to advance, allowing the execution of other code or other IO operations while the OS processes concurrent IO. The process does not sleep while waiting for the IO interrupt, but takes on the responsibility to check for the IO flag on the filehandle. Much more complicated code due to the need to check the IO flag with select, though does not require thread-safe code or synchronized methods and containers. Low execution over-head at the expense of code complexity. Execution paths are convoluted.

Asynchronous Non-Blocking IO - A hybrid approach to IO aimed at reducing complexity by using threads, while maintaining scalability by using non-blocking IO operations where possible. This would be the most complex type of IO requiring synchronized methods and containers, as well as convoluted execution paths. This is not the type of IO that one should consider coding lightly, and is most often only used when using a library that will mask the complexity, something like Futures and Promises.

Mate answered 14/10, 2016 at 17:39 Comment(33)
Frameworks like AKKA & vert.x, support non-blocking features. People often confuse them to be non-blocking IO frameworks. These frameworks do a lot of things but not non-blocking IO. They only support asynchronous IO as described above.Florio
so, akka with dispatcher is a way to get Asynchronous Non-Blocking IO?Fustigate
Ok, what about async way in Python and NodeJS? They reduce complicated of Non-Blocking IO without threads.Athene
The above response is entirely language agnostic. Many languages support some or all of the paradigms listed above. Python has a GIL so even it's threads are not truly async. You get the feeling of async, without the true power. You can use that for IO though because the majority of the work is shoveled to the OS which is async. Python does not greatly simplify non-blocking IO.Mate
There is somewhat of a misconception about NodeJS and threads. The assumption being that because you do not use threads in your code, the main event loop, that NodeJS is not using threads. In fact NodeJS does use threads for most of it's IO. softwareengineeringdaily.com/2015/08/02/…Mate
asyncio in python is single threaded, but you say threaded IOAtiana
See my previous comment? This response is not about Python, or any specific language. That said asynchronous is a generic term it could be used to desc non-blocking io by some as in the case you are mentioning. The usage in my answer here was meant to clear up the confusion when someone is trying to describe the different types of asynchronous IO. They are truly both asynchronous as they happen at the same time, but when asking what the difference between async vs non-blocking io, the description above I believe best explains it. With Python is that due to GIL that is the only async io.Mate
This is the most accurate answerMislay
Thank you. It can be a very confusing subject especially as a number of the terms are used interchangeably and differently depending on the project and platform. That was clearly the OPs confusion and I hoped to clear it up for him and others.Mate
You have mentioned that non-blocking IO is a single threaded IO. which I see in this code. Great point that you mentioned is execution paths are convoluted. But don't you think, multi-threading will help us make convoluted code to straight line code, by throwing my code on a thread. Read chapter1 first 4-5 lines from jcipAtiana
Can I say that, NodeJS async model is similar to chrome-V8 async model? I think, I don't agree with your final definition of Asynchronous IOAtiana
@Atiana Please elaborate. Don't think I said anything about the chrome-v8 async model, and as both chrome and nodejs are based on v8, it would be reasonable to say they are similar if not the same. As for the final definition of Asynchronous IO, how do you disagree?Mate
@Atiana The reason is the OP asking what "non-blocking async io" is. With a loose definition it is all async, thus non-blocking io is async, but then that means that "NB async IO" is essentially "NB NB io" or "async async IO". The name "NB async IO" implies that there is a specialization of either non-blocking or async IO, that combines two different things into one. Non-blocking IO is pretty clearly defined, thus async io is the one that is lacking in definition. For the name "non-blocking async io" to make sense the stricter definition is necessary.Mate
For your definition, Threaded IO in which the application uses threads of execution to perform Blocking IO operations concurrently. Firstly, there is no relevance on code for asynchronous IO being single or multi threaded. But, definitely there is relevance on code that has thread(single/multi) not being blocked, unlike you mentioned.Atiana
Still not following you and I explained the point you are trying to make. There is relevance on async IO being single or multi-threaded. For IO to be async in a single threaded execution, you MUST use non-blocking IO. For IO to be loosely considered asynchronous in a threaded execution you may use blocked threads or you may use non-blocking IO with unblocked threads. Thus non-blocking io (single threaded async), the very complex non-blocking io with threads known as Non-Blocking Async IO. What then do you call the one in the middle which is async IO with blocked threads?Mate
I choose and clearly stated why to distinguish that as "Asynchronous IO". It was merely algebraic. A = B + C where A = "Non-blocking Asynchronous IO", B = "Non-blocking IO" thus solving for C we have "Asynchronous IO".Mate
What I'd like to know is how the IO is managed by the OS and how it interups the process to let it know its done. If I understand, for non-blocking, the app needs to poll the OS to know if an IO is done. How does the OS poll the hardware? Does it not itself create threads for that?Tannie
In nearly all cases and operating systems regardless of synchronous or asynchronous IO, the OS will "signal" the process that an IO operation is ready for reading. That signal is often the result of a hardware interrupt that takes place on the processor, from the hardware device performing the IO.Mate
The kernel receives the interrupt and translates it to a signal, the kernel determines the process id that owns that filehandle for which the interrupt came, and then sends the signal to the process. Regardless of threading or not threading, the signal will interrupt the execution of the process so long as the process is set to listen to that signal. The signal handler will immediately be run, and then the program will resume operation per normal.Mate
This is where the processes are a bit different. In a synchronous IO system, once the IO begins the process goes to sleep. When a signal interrupt occurs, the sleep is terminated, the signal handler run, and then the code immediately after the sleep begins to run. In a synchronous read call, this is when the read completes, and the data becomes available in process. It is also why read may not read the same amount of data you tell it to, because there may not have been enough data in the buffer when the interrupt occurred.Mate
In a non-blocking system instead of going to sleep, the process continues to do work. The signal handler will have a flag to indicate that it has been tripped. Each time that you check for available IO, typically using select, you are examining those flags that are set by the signal handler. Select will let you know which of your open file handles are available for IO, and then your process can read/write the available buffers, without sitting idly in a sleep.Mate
Adding threading to the mix, typically the primary thread will be the thread that is interrupted, and will be the thread that runs the signal handler. That primary thread will then need to notify the thread that is utilizing that IO channel in the case of blocking IO, waking that thread from it's sleep to continue processing the data. In the case of non-blocking IO, the primary thread signal handler will simply change the flag on the file handle and return, leaving the work to the other thread to check the file handle for available IO.Mate
@DidierA. Just in case it didn't notify you of my response.Mate
In this context, blocking IO is IO where a system call is executed, and that system call installs an interrupt handler, and goes to sleep waiting for the interrupt to fire, and break the sleep.Mate
Non-Blocking IO, the system call executes, installs the interrupt handler associated with the file handle and immediately returns having not written or read any data. The program will have to check the file handle periodically to see if an IO interrupt has occurred and data is read to transfer.Mate
Great explanation @AaronM. So, non-blocking IO requires the program to poll the file handle periodically to check if the IO interrupt has occurred. If I'm understanding this correctly, it sounds like a waste to poll when we have this interrupt mechanism in place. Are there techniques to have the IO interrupt "bubble up" to the program to forgo the need for polling?Parade
Personally, I think you are much better off to poll than to rely on an interrupt directly. Essentially that is what is happening with the polling, it is checking to see if a flag was set indicating that an IO interrupt occurred. This would be the proper way to handle an interrupt in any case, you should not do additional work inside of a signal handler. Further, as ugly as select and poll raw IO can get, signal handling code can be much worse.Mate
@Parade to be clear this polling is a very efficient operation, it is not like polling a database table or polling the remote end of the network socket. You are simply checking to see if the kernel has set the flag on the file descriptor to let you know there is data in the kernel buffers.Mate
Gotcha, yeah I figured it was an extremely fast check, but I imagine this check needs to be done multiple (hundreds? thousands? more?) times per second, which just feels weird to me.Parade
I don't understand why Node's underlying Libuv needs a thread pool. Can't it just have OS thread which takes the requests to initiate IO, then start the IO tasks async, return control to event loop, then once event loop is done, poll all open IO tasks until either 1 is found with bytes or if event loop queue has a new callback?Turgescent
@AjaxLeung It surely could, but if you are going to thread at all, per-thread overhead is minimal, and threaded IO code is much easier to write, read and reason about.Mate
Do you know if libuv's IO implementation is truly non-blocking, or is it blocking IO in multiple threads?Turgescent
@AjaxLeung I do not, I have not reviewed the source for that. However, if it is using a thread pool, it very likely is blocking IO in multiple threads, and not non-blocking, though it is possible that there is a non-blocking master that delegates actual IO to the thread pool, when data is available. But I do not know the answer to that.Mate
R
63

So what is actually "non-blocking async IO"?

To answer that, you must first understand that there's no such thing as blocking async I/O. The very concept of asynchronism dictates that there's no waiting, no blocking, no delay. When you see non-blocking asynchronous I/O, the non-blocking bit only serves to further qualify the async adjective in that term. So effectively, non-blocking async I/O might be a bit of a redundancy.

There are mainly two kinds of I/O. Synchronous and Asynchronous. Synchronous blocks the current thread of execution until processing is complete, while Asynchronous doesn't block the current thread of execution, rather passing control to the OS Kernel for further processing. The kernel then advises the async thread when the submitted task is complete


Asynchronous Channel Groups

The concept of Async Channels in java is backed by Asynchronous Channel Groups. An async channel group basically pools a number of channels for reuse. Consumers of the async api retrieve a channel from the group (the JVM creates one by default) and the channel automatically puts itself back into the group after it's completed its read/write operation. Ultimately, Async Channel Groups are backed by surprise, threadpools. Also, Asynchronous channels are threadsafe.

The size of the threadpool that backs an async channel group is configured by the following JVM property

java.nio.channels.DefaultThreadPool.initialSize

which, given an integer value will setup a threadpool of that size, to back the channel group. The channel group is created and maintained transparently to the developer otherwise.


And how all them can be implemented in Java

Well, I'm glad you asked. Here's an example of an AsynchronousSocketChannel (used to open a non-blocking client Socket to a listening server.) This sample is an excerpt from Apress Pro Java NIO.2, commented by me:

//Create an Asynchronous channel. No connection has actually been established yet
AsynchronousSocketChannel asynchronousSocketChannel = AsynchronousSocketChannel.open(); 

/**Connect to an actual server on the given port and address. 
   The operation returns a type of Future, the basis of the all 
   asynchronous operations in java. In this case, a Void is 
   returned because nothing is returned after a successful socket connection
  */
Void connect = asynchronousSocketChannel.connect(new InetSocketAddress("127.0.0.1", 5000)).get();


//Allocate data structures to use to communicate over the wire
ByteBuffer helloBuffer = ByteBuffer.wrap("Hello !".getBytes()); 

//Send the message

Future<Integer> successfullyWritten=  asynchronousSocketChannel.write(helloBuffer);

//Do some stuff here. The point here is that asynchronousSocketChannel.write() 
//returns almost immediately, not waiting to actually finish writing 
//the hello to the channel before returning control to the currently executing thread

doSomethingElse();

//now you can come back and check if it was all written (or not)

System.out.println("Bytes written "+successfullyWritten.get());

EDIT: I should mention that support for Async NIO came in JDK 1.7

Roee answered 2/8, 2014 at 23:43 Comment(9)
There are three kinds: blocking, non-blocking, and asynchronous. You've missed the point of the question about how they can be implements in Java with it external libraries.Capuchin
@EJP - There is internal support for Async I/O in Java without external libraries, I have that in my answer. On the matter of blocking/non-blocking, is there blocking async I/O? If you have samples, I'm happy to update my answerRoee
Async I/O is generally async because the I/O mechanism is blocking. In this context, asynchronous simply means it's done in another thread.Tracay
So, @KevinKrumwiede does it mean according to your definition, that every I/O is blocking, the question is at what point of time/thread do we block, correct? Than we should talk only about sync/async io and do not mention blocking/non-blocking, because it blocks all the time (maybe not immediately, like future.get() - wait for the result), or from async thread (we block the async thread execution at some point).Ineludible
I suppose all I/O is blocking on some level, in the hardware if not in the software. Whether you call it blocking depends on what API is presented to you, i.e., whether it blocks your threads. If the I/O is non-blocking outside the API, it's because it's been made asynchronous on some level inside the API. That's why it's redundant to say "non-blocking async I/O." Non-blocking and async imply each other.Tracay
@kolossus, your answer is very apt. But, could it be said of the Servlet 3.1 implementation then? Servlet 3.0 came out with async servlets while using traditional IO and Servlet 3.1 came up with Non Blocking IO. So, if an async servlet uses non blocking IO, then won't it be called async non blocking?Abattoir
if successfullyWritten has not finished yet ,will successfullyWritten.get() blocks current thread? If so , this kind of usage seems ugly.Kilmer
Actually you can have async blocking IO. As long as any one OS thread is suspended in a wait state due to a blocking system call, then there is blocking IO. Java's "NIO" is blocking async IO because although the thread initiating the IO does so async (putting task into NIO queue so it can continue without waiting), the actual system call executed by the thread in the NIO pool is a blocking system call. Java should really start using the nonblocking system calls provided by the OS, then it Java NIO wouldn't need any thread pool. In true NIO, there is no OS thread, just an IO interrupt handler.Turgescent
@AjaxLeung - One reason Java may not use the native async calls provided by the OS, because on some platforms, contrary to the strident assertions above, async I/O does, in fact, block: lwn.net/Articles/724198 And there is no way Java could just use an interrupt handler - because those are typically not run in user space, nor do they pass data back & forth to user space. In fact, under the hood, sometimes it's not an interrupt handler at all: lwn.net/Articles/663879Shephard
T
8

Non blocking IO is when the call to perform IO returns immediately, and does not block your thread.

The only way to know if the IO is done, is to poll its status or block. Think of it as a Future. You start an IO operation, and it returns you a Future. You can call isDone() on it to check if its done, if it is, do what you want with it, otherwise keep doing other stuff until the next time you want to check if its done. Or, if you're out of things to do, you can call get on it, which will block until its done.

Async IO is when the call to perform IO notifies you it is done through an event, not through its return value.

This can be blocking or non-blocking.

Blocking Async IO

What is meant by blocking async IO is that the call to perform IO is a normal blocking call, but the thing you called wrapped that call inside a thread which will block until the IO is done and then delegate the handling of the result of the IO to your callback. That is, there is still a thread lower down the stack which is blocked on the IO, but your thread isn't.

Non-blocking Async IO

This is actually the more common one, and it means that the non-blocking IO does not need to be polled for its status, as with standard non-blocking IO, instead it will call your callback when its done. As opposed to blocking async IO, this one has no threads blocked anywhere down the stack, thus its faster and uses less resources, as the asynchronous behavior is managed without blocking threads.

You can think of it as a CompletableFuture. It requires that your program has some form of async event framework, which can be multi-threaded or not. So its possible the callback is executed in another thread, or that it is scheduled for execution on an existing thread once the current task is done.

I explain the distinction more thoroughly here.

Tannie answered 6/5, 2018 at 23:12 Comment(10)
A callback is neither blocking nor non-blocking. I have never seen a framework/language/system where the thread will stop pending the call to the callback and then begin again where the callback is initiated. Perhaps such a system does exist but that would be quite bizarre. As you have stated typically the callback is registered, and execution continues independent of the call back. This answer feels very JavaScript centric when the question was agnostic or Java centric.Mate
Take a look at my clarification about how the underlying IO occurs. I think it will help sort this out a bit for you.Mate
@Mate I edited my answer to get rid of what I think gave you the impression I was confused. Your answer is good, but I felt it was a bit too detailed in the technicalities. I also disagree somewhat to some of your semantics, but only mildly. My examples are Java based, no JavaScript anywhere in my answer. I feel it applies generically to all languages and OS. Do you still see anything confusing or that you disagree with it now?Tannie
makes sense, I like it better now. The only issue I have is with the Async Non-Blocking. From the developer layer it seems accurate, but from the system layer it is not. If the IO is non-blocking then something must check to see if/when the IO completes. The kernel is not going to automatically call a function within your stack. But as you mentioned, this requires a framework, and that framework is going to manage that complexity for the developer. Thank heavens.Mate
As to the JavaScript comment, what I should have said was that it felt tilted towards an evented/functional programming environment, which I still feel that it is. That isn't as common in Java, and is very common in JavaScript, hence the comment I did make. But all of these types of IO are also used in non-event driven code, traditional procedural code as well. The async becomes a lot more complicated in that case, but it is very possible to do non-blocking async io without using a callback (or promise or future). The callback and other alternatives do make the code easier to follow.Mate
@Mate Actually, it is possible for the Kernel to automatically call a function within your stack, this is how async IO is implemented on Windows NT Kernels for example. In Linux, you can only get multiplexed IO, like epoll, and then build an async dispatch on top, where you call a handler when the IO is complete. This is what Java NIO 2 added with AsynchronousChannel. Which added an evented model, where you can pass a CompletionHandler which java will call when the IO is complete.Tannie
That sounds very cool and would provide for a very performant IO implementation, though the security concerns would be significant. But I suspect we again have a misunderstanding. I think by kernel you mean the JVM kernel. By kernel I mean the operating system kernel. More specifically I mean those routines which execute in protected space, vs those routines that execute in user space. The kernel operates in protected space and that is where the direct access to hardware occurs. Java runs in user space and has no true direct access to hardware.Mate
Based on your comment I did some reading up on Windows IO model. I agree it is very cool, but as far as I can tell, while the async is completely handled by the system call, there is a portion of that system call that is in user space and a portion that runs in protected space. The portion that runs in user space still appears to be allocating threads for you to manage the asynchronous IO. The kernel is not actually calling your call back but that thread that was allocated for you running in user space. If the kernel called your callback, it would execute in protected space, not good.Mate
@Mate I'm talking about both the OS Kernel, and the JVM. NIO 2 implements async based IO with completion events as a layer on top of all OS. For Windows, you technically don't need such an added layer, since the OS Kernel is already providing a completion event oriented async IO. I'd suggest you check this out blog.stephencleary.com/2013/11/there-is-no-thread.htmlTannie
@Mate In effect, the threads are not for the IO, but for the handling of what to do after the IO is completed. You can use your own, or the OS can provide you with its own user space threads, which it will make sure to properly distribute and maximise their use. The OS thus fully handles reading all bytes from the socket or file directly into a user space buffer (so no copying is required when transitioning back to user space), and when its done, it hands you a user thread for you to handle the completed IO.Tannie
H
5

I would say there are three types of io:

synchronous blocking
synchronous non-blocking
asynchronous

Both synchronous non-blocking and asynchronous would be considered non-blocking as the calling thread is not waiting on the IO to complete. So while non-blocking asynchronous io might be redundant, they are not one in the same. When I open a file I can open it in non-blocking mode. What does this mean? It means when I issue a read() it won't block. It will either return me the bytes that are available or indicate that there are no bytes available. If I didn't enable non-blocking io the read() would block until data was available. I might want to enable non-blocking io if I want a thread to handle multiple io requests. For instance, I could use select() to find out what file descriptors, or maybe sockets, have data available to read. I then do synchronous reads on those file descriptors. None of those reads should block because I already know data is available, plus I have opened the file descriptors in non-blocking mode.

Asynchronous io is where you issue an io request. That request is queued, and thus doesn't block the issuing thread. You are notified when either the request failed or has completed successfully.

Hurlow answered 30/9, 2015 at 19:47 Comment(0)
C
5

Synchronous vs. asynchronous

Asynchronous is a relative term that applies to all kinds of computation, not just IO. Something can not be asynchronous by itself but always to something else. Usually, asynchronicity means that some operation is happening in a different thread of execution relative to the thread that requested the IO computation, and there is no explicit synchronization (waiting) between a requesting and a computing threads. If a requesting thread waits (sleeps, blocks) while the computing thread is doing its work, we call such an operation synchronous. There are also mixed cases. Sometimes a requesting thread doesn't wait immediately and performs some fixed amount of useful work asynchronously after issuing an IO request, but later blocks (synchronizes) to await for the IO results if they are not yet readily available.

Blocking vs. non-blocking

In the broader sense, "blocking" and "non-blocking" can roughly be used to denote "synchronous" and "asynchronous" correspondingly. You will often encounter "blocking" to be used interchangeably with "synchronous" and "non-blocking" with "asynchronous". In this sense, "non-blocking asynchronous" is redundant as other folks mentioned above.

However, in a more narrow sense "blocking" and "non-blocking" may refer to different kernel IO interfaces. It's worth saying here that all IO operations these days are performed by the OS kernel because access to IO hardware devices such as disks or network interface cards is abstracted away by the OS. It means that every IO operation that you request from your userspace code will end up being executed by the kernel via either blocking or non-blocking interface.

When called via the blocking interface, the kernel will assume that your thread wants to obtain results synchronously and will put it to sleep (deschedule, block) until the IO results are available. Therefore that thread will not be able to do any other useful work while the kernel is fulfilling the IO request. As an example, all disk IO on Linux is blocking.

Non-blocking kernel interfaces work differently. You tell the kernel which IO operations you want. The kernel doesn't block (deschedule) your thread and returns from the IO call immediately. Your thread can then move on and do some useful work. Kernel threads will fulfill the IO requests asynchronously. Your code then needs to check occasionally if the kernel has already done its job, after which you can consume the results. As an example, Linux provides the epoll interface for the non-blocking IO. There are also older poll and select system calls for the same purpose. It's worth noting that non-blocking interfaces mostly apply and are used for networking.

Please, note that the fact that some higher-level IO APIs use blocking kernel IO under the hood doesn't mean that your thread will necessarily block when calling that API. Such an API may implement a mechanism to spawn a new or use a different existing thread to perform that blocking IO. It will notify your calling thread later through some means (a callback, an event, or by letting your thread poll) that it has completed the IO request. I.e., non-blocking IO semantics can be implemented in userspace by third-party libraries or runtimes on top of the blocking OS kernel interfaces by using additional threads.

Conclusion

To understand how each particular runtime or library achieves IO asynchronicity, you will have to go deeper and find out if it spawns new threads or relies upon asynchronous kernel interfaces.

Afterword

Realistically, there is very little chance you will encounter genuinely single-threaded systems these days.

As en example, most people will refer to Node.js as having a "single-threaded non-blocking" IO. However, this is a simplification. On Linux, truly non-blocking IO is only available for network operations through the epoll interface. For disk IO, the kernel will always block the calling thread. To achieve asynchronicity for disk IO (which is relatively slow), Node.js runtime (or libuv to be precise) maintains a dedicated thread pool. Whenever an asynchronous disk IO operation is requested, the runtime assigns the work to one of the threads from that pool. That thread will do standard blocking disk IO, while the main (calling) thread will go on asynchronously. Not to mention numerous threads, which are maintained separately by V8 runtime for garbage collection and other managed runtime tasks.

Coles answered 21/2, 2020 at 11:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.