Windows - Does accessing data through "localhost" incur network stack overhead
Asked Answered
N

2

10

I have a large number of audio files I am running through a processing algorithm to attempt to extract certain bits of data from it (ie: average volume of the entire clip). I have a number of build scripts that previously pulled the input data from a Samba network share, which I've created a network drive mapping to via net use (ie: M: ==> \\server\share0).

Now that I have a new massive 1TB SSD, I can store the files locally and process them very quickly. To avoid having to do a massive re-write of my processing scripts, I removed my network drive mapping, and re-created it using the localhost host name. ie: M: ==> \\localhost\mydata.

When I make use of such a mapping, do I risk incurring significant overhead, such as from the data having to travel through part of Windows' network stack, or does the OS use any shortcuts so it equates more-or-less to direct disk access (ie: does the machine know it's just pulling files from its own hard drive). Increased latency isn't much of a concern of mine, but maximum sustained average throughput is critical.

I ask this because I'm deciding whether or not I should modify all of my processing scripts to work with a different style for network paths.

Extra Question: Does the same apply to Linux hosts: are they smart enough to know they are pulling from a local disk?

Noway answered 18/11, 2015 at 21:5 Comment(5)
Throughput will be affected to some extent. On a spinning drive, the increased per-file overhead accounts for most of the performance loss, so if you are dealing with a small number of large files it probably won't be noticeable. On an SSD I don't know. Try it and see!Hewlett
... but the best solution in this case is probably to use subst to assign a drive letter to the folder. The overhead on that is negligible, and the network stack is not involved.Hewlett
Question seems to assume two possible answers: yes the OS optimizes access to drive mapped to local share and no it doesn't. But it's not that simple. There are many layers in the stack between your app and it's data. On Linux and Windows there will be some optimization at network layer for a local network connection (at the very least the MAC layer and below are avoided). However it's certainly the case that the code path to a local drive vs a mapped network drive won't be the same. To Harry's point app's behavior can create a significant delta. Bottom line: benchmark to know for sureAmericanism
Doesn't this depend on what methods you're using to read from? Can you post the method where you're fetching the data?Loyal
@ChristopherBales I would be copying files via shell scripts using the XCOPY command, and passing a network mapped drive created either via subst or net use, if one is faster than the other.Noway
E
6

When I make use of such a mapping, do I risk incurring significant overhead,

Yes. By using an UNC path (\\hostname\sharename\filename) as opposed to a local path ([\\?\]driveletter:\directoryname\filename), you're letting all traffic occur through the Server Message Block protocol (SMB / Samba). This adds a significant overhead in terms of disk access and access times in general.

The flow over a network is like this:

Application -> SMB Client -> Network -> SMB Server -> Target file system

Now by moving your files to your local machine, but still using UNC to access them, the flow is like this:

Application -> SMB Client -> localhost -> SMB Server -> Target file system

The only thing you minimized (not eliminated, SMB traffic to localhost still involves the network layers and all computations and traffic associated) is network traffic.

Also, given SMB is specifically tailored for network traffic, its reads may not optimally use your disk's and OS's caches. It may for example perform its reads in blocks of a certain size, while your disk performs better when reading blocks of another size.

If you want optimal throughput and minimal access times, use as little layers in between as possible, in this case by directly accessing the filesystem:

Application -> Target file system
Edwyna answered 3/12, 2015 at 12:8 Comment(8)
Does using a drive mapping via subst or net use remove some of that overhead compared to a UNC path? Also, is subst different from or better than net use in any way? Thank you.Noway
No, the transfer will still happen over SMB over TCP through localhost, where most of the additional bottleneck lies. Subst and net use are only used to create different representations of the same resource, they don't fundamentally change how that resource is accessed. AFAIK though, can't look that up ATM.Edwyna
Thank you. This is exactly what I was looking for.Noway
Could you please provide an example for the second case, [\\.\]driveletter:\directoryname\filename. I tried ` \\.\c:\ ` to access my "C" drive, but the syntax appears to be invalid.Noway
@Dogbert my bad, see #21195030. In your case it's \\?\C:.Edwyna
Thank you. One last question: the example above works when I use the "Start ==> Run..." dialog box, but when I try to map a network drive via the Windows Explorer "Map Network Drive" or CLI "net use" command, it fails. Is the notation different in those cases?Noway
Ah, in that case, I guess I just prefix my local paths with `\\?\C:` instead in my scripts. Thanks!Noway
You generally don't need that syntax anyway. I just mentioned it for completeness's sake.Edwyna
S
4

For sure using TCP over direct file access even with "loopback" has overheads such as routing, memory allocations etc. both on linux and windows, yes loopback device is a non-physichal kernel device and faster than the other network devices but not faster than direct file access. As far as I know on windows there are additional loopback optimizations such as NetDNA and "Fast TCP Loopback".

I assume the bottleneck with loopback device will be memory (copy) processes. So directly accessing a file rather than over loopback device will always be faster (and low-resource consuming) both on linux and windows.

Additionally, both operating systems solves protocol overheads for IPC via "named pipes" on windows and "unix domain sockets" on linux, using these will also be faster than using the loopback device whenever applicable.

Schroeder answered 3/12, 2015 at 11:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.