Not understanding dd command arguments at all
Asked Answered
P

3

15

I'm passingly familiar with the dd command, but I've rarely had the need to use it myself. Today I need to, but I'm encountering behavior that seems really weird.

I want to create a 100M text file, each line of which contains the single word "testing." This was my first try:

~$ perl -e 'print "testing\n" while 1' | dd of=X bs=1M count=100
0+100 records in
0+100 records out
561152 bytes (561 kB) copied, 0.00416429 s, 135 MB/s

Hmm, that's odd. What about other combinations?

~$ perl -e 'print "testing\n" while 1' | dd of=X bs=100K count=1K
0+1024 records in
0+1024 records out
4268032 bytes (4.3 MB) copied, 0.0353145 s, 121 MB/s

~$ perl -e 'print "testing\n" while 1' | dd of=X bs=10K count=10K
86+10154 records in
86+10154 records out
42524672 bytes (43 MB) copied, 0.35403 s, 120 MB/s

~$ perl -e 'print "testing\n" while 1' | dd of=X bs=1K count=100K
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.879549 s, 119 MB/s

So of these four apparently-equivalent commands, all produce files of different sizes, only one of which is the one I would expect. Why is that?

EDIT: By the by, I'm a little embarrassed I didn't think of "yes testing" instead of that longer Perl command.

Pother answered 21/7, 2011 at 21:7 Comment(1)
"dd has ibs/obs to deal with issues of differing input and output block sizes."Asbestos
B
9

I'm not yet sure why, but using this method will not fill up an entire block before saving it. Try:

perl -e 'print "testing\n" while 1' | dd of=output.txt bs=10K count=10K iflag=fullblock
10240+0 records in
10240+0 records out
104857600 bytes (105 MB) copied, 2.79572 s, 37.5 MB/s

The iflag=fullblock seems to force dd to accumulate input until the block is full, although I'm not sure why this is not the default, or what it actually does by default.

Binary answered 21/7, 2011 at 21:24 Comment(0)
Y
10

To see what's going on, let's look at the output of strace for a similar invocation:

execve("/bin/dd", ["dd", "of=X", "bs=1M", "count=2"], [/* 72 vars */]) = 0
…
read(0, "testing\ntesting\ntesting\ntesting\n"..., 1048576) = 69632
write(1, "testing\ntesting\ntesting\ntesting\n"..., 69632) = 69632
read(0, "testing\ntesting\ntesting\ntesting\n"..., 1048576) = 8192
write(1, "testing\ntesting\ntesting\ntesting\n"..., 8192) = 8192
close(0)                                = 0
close(1)                                = 0
write(2, "0+2 records in\n0+2 records out\n", 31) = 31
write(2, "77824 bytes (78 kB) copied", 26) = 26
write(2, ", 0.000505796 s, 154 MB/s\n", 26) = 26
…

What happens is that dd makes a single read() call to read each block. This is appropriate when reading from a tape, which is what dd was originally mainly used for. On tapes, read really reads a block. When reading from a file, you have to be careful not to specify a too large block size, or else the read will be truncated. When reading from a pipe, it's worse: the size of the block that you read will depend on the speed of the command producing the data.

The moral of the story is not to use dd to copy data, except with safe, small blocks. And never from a pipe except with bs=1.

(GNU dd has a fullblock flag to tell it to behave decently. But other implementations don't.)

Yatzeck answered 21/7, 2011 at 23:59 Comment(0)
B
9

I'm not yet sure why, but using this method will not fill up an entire block before saving it. Try:

perl -e 'print "testing\n" while 1' | dd of=output.txt bs=10K count=10K iflag=fullblock
10240+0 records in
10240+0 records out
104857600 bytes (105 MB) copied, 2.79572 s, 37.5 MB/s

The iflag=fullblock seems to force dd to accumulate input until the block is full, although I'm not sure why this is not the default, or what it actually does by default.

Binary answered 21/7, 2011 at 21:24 Comment(0)
P
3

My best guess is that dd reads from the pipe and when it's empty it assumes that it read the whole block. The results are quite inconsistent:

$ perl -e 'print "testing\n" while 1' | dd of=X bs=1M count=100
0+100 records in
0+100 records out
413696 bytes (414 kB) copied, 0.0497362 s, 8.3 MB/s

user@andromeda ~
$ perl -e 'print "testing\n" while 1' | dd of=X bs=1M count=100
0+100 records in
0+100 records out
409600 bytes (410 kB) copied, 0.0484852 s, 8.4 MB/s
Pneumatophore answered 21/7, 2011 at 21:27 Comment(2)
Yep..same experience here. Seems to have something to do with reading from the pipe. Put a short sleep in the loop and you get even worse results.Viticulture
Actually, if you looked real hard, you'd find they were entirely consistent, and the short reads always occur on a very specific variation on read() size - down to a byte.Putup

© 2022 - 2024 — McMap. All rights reserved.