Which USB read pattern is more efficient: Multiple reads or one big read?
Asked Answered
D

4

5

Which one is more efficient (= fastest) implementation for transferring data through USB and writing it on memory for further processes:

  1. reading a little data from USB and write on memory repeatedly multiple times.
  2. reading one huge data from USB and write it on memory.
Duad answered 30/4, 2019 at 7:42 Comment(6)
Historically, and for hard disks, reading and writing blocks of the size of the underlying hardware and aligned at block boundaries was best. Writing data to part of a hardware block sometimes required reading that whole block, updating the in-memory buffer and writing the whole block back. Modern hardware and (device driver)software works hard to reduce those block related overheads.Kerato
@Kerato what about IO overheads? does IO overheads exceeds memory write overheads?Duad
@Kerato Writing data to part of a hardware block sometimes required reading that whole block, updating the in-memory buffer and writing the whole block back. It still does. Write four bytes to a disk that uses 2048-byte blocks and the entire block will have to be read, modified, and written back to the disk. Modern hardware and (device driver)software works hard to reduce those block related overheads. And it usually does a good job - but if you want to run a system at or near its design limits, you can't abstract away nor ignore the actual design.Hedjaz
Can you use mmap()?Galer
"Efficient" in what sense? Latency? Throughput? Energy use? USB device wear?Reparable
@Reparable efficiency of time point of view.Duad
L
2
  1. reading a little data from USB and write on memory repeatedly multiple times.
  2. reading one huge data from USB and write it on memory.

You should remember that having a memory reference is always the fastest. There is absolutely no competition, however, when it comes to memory, it might not always be ideal to have a chunk of data in memory at all times.

In your two questions, the one which is the best for not only fast access, but for cleanliness, the second option. It will significantly reduce the amount of i/o streaming to get data.

The issue with opening and closing too many times, which would be an issue should you choose the first option, is causing the disk to block until all the data can be flushed (each time you close). Not only does this harm disk caching mechanisms, IO will block until it can finish over and over. That can potentially cause even longer times.

Unless you absolutely have to use 1, 2 is a generally better choice. As always, however, the best way to check is to benchmark. What works for you might not work for someone else.

This stackoverflow discussion may interest you, it's not explicitly about C (rather its C++), however the underlying ideas are the same: Many small files or one big file? (Or, Overhead of opening and closing file handles) (C++)

Llanes answered 6/5, 2019 at 13:14 Comment(0)
L
4

In my Experience it's better to read a lot of data from USB in order to reduce latency from OS. Long time ago I was writing an application which had to write data to a device using USB in raw mode. The device was using a 128 bytes array to store data from the other part (Windows in my case). When I increased tha data size on the the device part, allocating 1 MB of space, I got a great increase of performance

Lightening answered 2/5, 2019 at 15:38 Comment(0)
G
2

A RAM access is always (*) faster than a real disk access...

Times depend of your hardware but for a small amount of data, A RAM access is a matter of ns while an USB access can range from tens of µs to milliseconds. That’s not specific to USB though: a RAM access is faster than an SSD access. That’s even more true in comparison to an USB access.

Another interesting thing to note is that the access time is not proportional to the size of data. This is especially true for the first megabytes (partially due to caches). So, the more you can read at once the better will your performances be.

Finally, when your data is stored in RAM, the most used data is cached resulting in even lower latency times.

Therefore, whenever it's possible, you should read the data at once and store it in RAM to subsequent accesses.

(*) The only limit for this rule is the size of your RAM. If your computer uses more RAM ram that it physically has, the extra data will be swapped, that is, the least accessed data will be transferred to your physical disk and retrieves when needed. This will obviously result in catastrophic performances.

In conclusion, read a huge amount at once, but no more than you have space in RAM to store it. Reading more than 1G at a time won’t significantly improve performances and can only cause trouble.

Gramps answered 2/5, 2019 at 12:46 Comment(2)
OP did not mention any "disks"; OP's system might not even have disks.Reparable
@Reparable Oops, I was thinking about USB but I wrote HDD. That's corrected.Gramps
L
2
  1. reading a little data from USB and write on memory repeatedly multiple times.
  2. reading one huge data from USB and write it on memory.

You should remember that having a memory reference is always the fastest. There is absolutely no competition, however, when it comes to memory, it might not always be ideal to have a chunk of data in memory at all times.

In your two questions, the one which is the best for not only fast access, but for cleanliness, the second option. It will significantly reduce the amount of i/o streaming to get data.

The issue with opening and closing too many times, which would be an issue should you choose the first option, is causing the disk to block until all the data can be flushed (each time you close). Not only does this harm disk caching mechanisms, IO will block until it can finish over and over. That can potentially cause even longer times.

Unless you absolutely have to use 1, 2 is a generally better choice. As always, however, the best way to check is to benchmark. What works for you might not work for someone else.

This stackoverflow discussion may interest you, it's not explicitly about C (rather its C++), however the underlying ideas are the same: Many small files or one big file? (Or, Overhead of opening and closing file handles) (C++)

Llanes answered 6/5, 2019 at 13:14 Comment(0)
Z
1

It all depends on your definition of performance. If you want to get the data off of a usb as fast as possible, one big read will do the trick.

However, one big read can result in errors or be a blocking action. Often doing multiple small reads will allow you to retry a partial read when an error happens and also allow you to update a ui when you finish a partial read.

Zamarripa answered 7/5, 2019 at 22:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.