can the infamous `ERROR_NETNAME_DELETED' error be considered an error at all? [duplicate]
Asked Answered
D

3

8

I'm writing a tcp server in Windows NT using completion ports to exploit asynchronous I/O. I have a TcpSocket class, a TcpServer class and some (virtual functions) callbacks to call when an I/O operation is completed, e.g. onRead() for when a read is completed. I have also onOpen() for when the connection is established and onEof() for when the connection is closed, and so on. I always have a pending read for the socket, so if the socket effectively gets data (the read will be completed with size > 0) it calls onRead(), instead if the client closes the socket from the client side (the read will be completed with size == 0) it calls onEof(), and the server is aware of when the client closes the socket with closesocket(server_socket); from its side.

All works gracefully, but I have noticed a thing:

when i call closesocket(client_socket); on the server's side endpoint of the connection, instead of the client side, (either with setting linger {true, 0} or not), the pending read will be completed as erroneous, that is, the read size will not only be == 0, but also GetLastError() returns an error: 64, or 'ERROR_NETNAME_DELETED'. I have searched much about this on the web, but didn't find nothing interesting.

Then I asked myself: but is this a real error? I mean, can this really be considered an error?

The problem is that on the server side, the onError() callback will be called when I closesocket(client_socket); instead of the onEof(). So I thought this:

What about if I, when this 'ERROR_NETNAME_DELETED' "error" is received, call onEof() instead of onError() ? Would that introduce some bugs or undefined behavior? Another important point that made me ask this question is this:

When I have received this read completion with 'ERROR_NETNAME_DELETED', I have checked the OVERLAPPED structure, in particular the overlapped->Internal parameter which contain the NTSTATUS error code of the underlying driver. If we see a list of NTSTATUS error codes [ http://www.tenox.tc/links/ntstatus.html ] we can clearly see that the 'ERROR_NETNAME_DELETED' is generated by the NTSTATUS 0xC000013B, which is an error, but it is called 'STATUS_LOCAL_DISCONNECT'. Well, it doesn't look like a name for an error. It seems more like `ERROR_IO_PENDING' which is an error, but also a status for a correct behavior.

So what about checking the OVERLAPPED structure's Internal parameter, and when this is == to 'STATUS_LOCAL_DISCONNECT' a call to the onEof() callback is performed? Would mess things up?

In addition, I have to say that from the server side, if I call DisconnectEx() before calling closesocket(client_socket); I will not receive that error. But what about I don't want to call DisconnectEx() ? E.g. when the server is shutting down and doesn't want to wait all DisconnectEx() completions, but just want to close all client's connected.

Dorri answered 24/1, 2013 at 10:53 Comment(1)
@Hans, I think he did a great job of describing how he encountered that error.Othilie
G
3

It's entirely up to you how you treat an error condition. In your case this error condition is entirely to be expected, and it's perfectly safe for you to treat it as an expected condition.

Another example of this nature is when you call an API function but don't know how large a buffer to provide. So you provide a buffer that you hope will be big enough. But if the API call fails, you then check that the last error is ERROR_INSUFFICIENT_BUFFER. That's an expected error condition. You can then try again with a larger buffer.

Geometer answered 24/1, 2013 at 11:14 Comment(3)
I agree with you. The only drawback I can imagine, is if that error ERROR_NETNAME_DELETED is generated for other things in Windows, and then when there is a real error condition the onEof() callback will be called instead of the onError(). So maybe I can just check the NTSTATUS in the OVERLAPPED structure.Dorri
You aren't meant to read the NTSTATUS value out of an OVERLAPPED struct. It's internal, and subject to change. The documentation is clear on that.Geometer
Yeah, that's right, the Internal member is not meant to be used, so maybe I'll rely on GetLastError()Dorri
D
2

It's up to you how to treat an error condition, but the question is a sign of potential problems in your code (from logic errors to undefined behavior).

The most important point is that you shouldn't touch SOCKET handle after closesocket. What do you do on EOF? It would be logical to closesocket on our side when we detect EOF, but that's what you cannot do in ERROR_NETNAME_DELETED handler, because closesocket already happened and the handle is invalid.

It's also profitable to imagine what happens if pending read completes (with real data available) just before closesocket, and your application detects it right after closesocket. You handle incoming data and... Do you send an answer to the client using the same socket handle? Do you schedule the next read on that handle? It would be all wrong, and there would be no ERROR_NETNAME_DELETED to tell you about it.

What happens if pending read completes with EOF in that very unfortunate moment, just before closesocket? If your regular OnEof callback is fired, and that callback does closesocket, it would be wrong again.

The problem you describe might hint at more serious problem if closesocket is done in one thread, while another thread waits for I/O completion. Are you sure that another thread is not calling WSARecv/ReadFile while the first thread is calling closesocket? That's undefined behavior, even though winsock makes it look as if it worked most of the time.

To summarize, the code handling completing (or failing) reads cannot be correct if it's unaware of socket handle being useless because it was closed. After closesocket, it's useful to wait for pending I/O completion because you can't reuse OVERLAPPED structure if you don't; but there's no point in handling this kind of completion as if it happened during normal operation, with socket being still open (error/status code is irrelevant).

Dich answered 24/1, 2013 at 11:31 Comment(2)
You have a really good point. Well, basically I have onEof() to run something to clean up (e.g. dealloc memory and so on), but I should effectively have 2 callbacks for that: onEof() that is called only when the other side closes the connection, and onClose() -- in this manner when onEof() is received, the other part can call closesocket() as you suggested. So, if I got it, if pending read completes before closesocket() and the application detects them after, there is ERROR_NETNAME_DELETED to signal this scenario, and is it its purpose?Dorri
No, "completes before" / "detected after" scenario has no ERROR_NETNAME_DELETED, it's an example where things can go logically wrong in an undetected way. It's also an illustration why it's likely wrong to do cleanup in ERROR_NETNAME_DELETED callback: with this "unfortunate" timing for successfull read, there is nothing to handle (and there will be nothing to handle because you can't reschedule WSARecv on closed socket).Dich
B
0

You're calling the wrong method. You should be calling WSAGetLastError(). The result of GetLastError() after a Winsock API call is meaningless.

Blackpool answered 24/1, 2013 at 22:29 Comment(2)
Actually I'm calling WSAGetLastError(). And also with that, the error value is the same. I have mentioned GetLastError() because I use I/O completion ports also for non-network I/O, but I think that error makes sense only for networked I/O.Dorri
@MarcoPagliaricci So actually your question should say so.Blackpool

© 2022 - 2024 — McMap. All rights reserved.