dd allows you to specify your seek, skip and count values as bytes instead of blocks using the iflag/oflag options seek_bytes, skip_bytes and count_bytes.
So to read the first MB of data 100GB into a file you can:
That's going to be very slow though, as it will read and write everything one byte at a time. The problem of course is that you can't specify seek/skip/count values as bytes unless your block size is 1 byte.
If you don't need seek then you can at least use ibs=1 instead, as skip/count operate on input blocks, but this will still read one byte at a time even though it will aggregate the reads into larger output blocks.
It would be really nice if we had a dd2 tool that offered similar options but defaulted the block size to a new "auto" value, treated seek/skip as byte values, and treated count as a byte value if the input block size is "auto".
You know, you can dd to a file to avoid destroying a drive. And you can provide a `count` to write fixed-size output files, dd will write bs×count bytes (technically it writes ibs×count, and bs sets both ibs and obs).
Partial reads won't corrupt the data. Dd will issue other read() until 1MB of data is buffered. The iflag=fullblock is only useful when counting or skipping bytes or doing direct I/O. See line 1647: https://github.com/coreutils/coreutils/blob/master/src/dd.c#...
According to the documentation of dd, "iflag=fullblock" is required only when dd is used with the "count=" option.
Otherwise, i.e. when dd has to read the entire input file because there is no "count=" option, "iflag=fullblock" does not have any documented effect.
From "info dd":
"If short reads occur, as could be the case when reading from a pipe for example, ‘iflag=fullblock’ ensures that ‘count=’ counts complete input blocks rather than input read operations."
Huh, pleasantly surprised to learn that dd correctly handles the truncated final block of a not-multiple-of-512-byte file; could have sworn that didn't work at some point.
a note: I'd recommend using tee instead of dd for that job, or add iflag=fullblock if your dd supports it.
The thing is that dd issues a read() for each block, but is doesn't actually care how many bytes it gets back in response (unless you turn on fullblock mode).
This isn't really a problem when you're reading from a block device, because it's pretty uncommon to get back less data than you requested. But when you're reading from a pipe, it can/does happen sometimes. So you might ask for five 32-byte chunks, and get [32, 32, 30, 32, 32]-sized chunks instead. This has the effect of messing up the contents of file you're writing, with possibly destructive effects.
To avoid it, use `tee` or something else. Or use iflag=fullblock to ensure that you get every byte you request (up to EOF or count==N).
I was about to bet on "read fail repeat skip" cycle for dd's behaviour but, looking into coreutil's source code at https://github.com/goj/coreutils/blob/master/src/dd.c , if I'm not mistaken , dd does not try to be intelligent and just uses a zeroed out buffer so It would return 0's for unreadable blocks.
A lot of use cases of `dd` are better served by `head -c $bytes`. `dd` does provide a lot more control if you need it, but when you don't just use head.
Some years ago I picked up the habit from a predecessor of testing such things with dd instead, that way you can experiment with the effect of different block sizes, so like -
These days, my main use of dd is to get a specific amount of data from a file, where both "bs" and "count" are useful (no, "bs" does not only set the buffer, it also sets the chunk size for reads and writes, this is SOMETIMEs useful and/or necessary with tapes).
So, this is an approximation of a command pipeline I run several times per year, when I happen to need a secret of an approximate length:
dd if=/dev/urandom bs=6 count=1 | base64
Tune the "bs=6", depending on how long you want your (guaranteed typeable) secret to be. Every 3 bytes in the input will give you 4 characters in the output and keeping the input block size a multiple of 3 avoids having the output ending in "=".
It MAY be possible to replace this use of dd with other shell commands. But, since I needed to learn enough of dd to cope with tapes, I use that.
Per the sibling comments, you just need to specify a sane block size. dd's default is really low and if you experiment a bit with 2M or around that you'll get near-theoretical throughput.
NB: Remember the units! Without the units you specify it as bytes or something insanely small like that. I've made that mistake more than once!
Interesting assertion. Can you show me a shell invocation without using dd that cuts off the first 16 bytes of a binary file, for example? This is a common reason I use dd.
but not when I omit the `count=...` for some reason (maybe it isn't showing the warning in that case because it doesn't matter - apparently the effect this has is one of the "blocks" being smaller, and thus fewer bytes being copied, but it doesn't add padding or anything stupid like that, see https://unix.stackexchange.com/questions/121865/create-rando...).
I wish we had a cat-like tool for writing into files, for the "cat foo | do-something | sudo dd of=/dev/something" use case.
dd allows you to specify your seek, skip and count values as bytes instead of blocks using the iflag/oflag options seek_bytes, skip_bytes and count_bytes.
So to read the first MB of data 100GB into a file you can:
> dd if=/tmp/file1 bs=2M skip=100G count=1M iflag=count_bytes,skip_bytes
reply