Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
The cult of dd (2017) (eklitzke.org) similar stories update story
289 points by _dain_ | karma 3066 | avg karma 2.26 2022-10-24 19:44:03 | hide | past | favorite | 241 comments



view as:

I agree that other tools may be more user friendly, but if users are blindly copying and not really understanding what commands do then that is partly on them and lack of documentation. Arch Wiki provides a ton of options to copy a disk image and so it is one place that does show other options.

I honestly didn't know that dd just did the same thing as cp or cat. I thought you had to use it for making bootable USBs because it was doing some kind of special arcane magic behind the scenes .. this intuition was reinforced by the fact that so many GUI tools purporting to make bootable USBs are so unreliable. If literally all you have to do is copy bytes from one place to another, why the fuck do those tools fail so often?

Doing this kind of thing is a) rare and b) usually you have more things on your mind at the time, so you tend to be quite conservative and not think to experiment.


> special arcane magic

It doesn't really do anything particularly arcane (unless you count EBCDIC conversions) but it does things that are useful and often necessary that other equally-standard utilities can't. Isn't that good enough?

> Doing this kind of thing is a) rare

Probably not as rare as you think. Maybe it's not all that useful for writing applications, but for the quite large number of people who have to do provisioning (either bare-metal or virtual) and such it's a different matter. There's a lot to be said for tools that are ubiquitous, well standardized, and flexible.


I’d say that a big part of it is a lack of discoverability. The reason people copy blindly, especially if they’re new to Unix-likes, is that there are lots of commands with lots of flags, each of which you just kinda have to know about. And they work together reasonably well (except when they don’t), but going from “I need to accomplish this task” to a particular command is generally going to require a Google.

Or a man page :P

But that’s the issue — discoverability. How to get from “I want to search” to man grep or “I plugged in my USB drive and need to get at the files” to man mount/man fstab/man something else. man pages are a decent reference once you know what you’re looking for (assuming they’re up to date, correct, and present on your system) but they aren’t anything like a replacement for Google.

    $ apropos search
    ...
    grep (1p)            - search a file for a pattern
    ...

On each of the two machines I have immediate access to:

    $ apropos search
    apropos: Command not found.
Aside from the obvious, man pages just aren’t the answer most of the time.

Part of the issue is that a man page is a reference. This is a necessary kind of documentation, but sometimes I want a tutorial and sometimes I want a cookbook. References are great for some things, but discovering new tools isn’t generally one of them. Obviously, you don’t have to write man pages that way, but almost all of the ones I see in the wild are.

Another part is that man pages as a resource have really atrophied in my experience. Lots of new CLI tools don’t have them at all, and lots of systems don’t have them installed even for older/core tools. The why is harder. I suspect that it’s a mix of tooling (e.g. needing to learn groff/troff to write them), search (your man pages aren’t indexed by Google by default like your GitHub readme), and culture. But I’m not sure.


> But who cares? Why not just let the command figure out the right buffer size automatically?

Well, because it doesn't. At least on the Linux version I used in the past, it defaulted to 512 byte blocks or something similarly small and it commits every block individually, leading to really slow performance with USB sticks with controllers not smart enough to bundle writes together. I wouldn't be surprised if that incurs some heavy write amplification at flash level too. Perhaps it's smart enough to figure a better block size now but this is where that habit comes from.

Another thing, creating sparse files with the seek option (simply put files containing zeros that are not actually written to the disk nor taking space, but do turn up all these zeroes when you read the file). Also something not duplicated with cat or head.

What I like about dd is that it can do pretty much all disk operations in one simple tool. Definitely worth learning all of it IMO.


>> But who cares? Why not just let the command figure out the right buffer size automatically?

i think he's talking about cat being the command that figures it out automatically


I don't read it that way. He calls out that 'weird' command specifically. But indeed he doesn't specify.

I wonder what cat does in terms of buffers, I kinda doubt it has any special optimisations though I would guess the shell redirect might have. As that's really the thing doing the work there, not cat. Edit: Nope, I'm wrong there!!

Also that command does more than just specify the memory buffer like he says. That's my point, it's useful for tuning which can be super helpful with huge images.

It can also lead to some dangerous gotchas as well when working with files. But with full disk images these don't apply generally.


> though I would guess the shell redirect might have. As that's really the thing doing the work there, not cat.

No? The shell redirection is just

    int tmp_fd = open("/dev/sdb", O_WRONLY|O_TRUNC, 0666);
    dup2(tmp_fd, STDOUT_FILENO);
    close(tmp_fd);
(plus error handling and whatnot)

`cat` ends up with a file descriptor directly to the block device, same as `dd` does; the only difference is whether the `open()` call comes before or after the `execve()` call.


Ah ok good point! I stand corrected.

I really doubt cat is smart enough to figure out a suitable block size though. At least with dd you can specify one.


> I really doubt cat is smart enough to figure out a suitable block size though

It seems to try, at least

https://github.com/coreutils/coreutils/blob/master/src/cat.c...


Going through the historical versions (copying from my other comment):

- >=9.1 (2022-04-15) : Use the `copy_file_range()` syscall and let the kernel figure it out

- >=8.23 (2014-07-18) : max(128KiB, st_blksize(infile), st_blksize(outfile))

- >=8.17 (2012-05-10) : max(64KiB, st_blksize(infile), st_blksize(outfile))

- >=7.2 (2009-03-31) : max(32KiB, st_blksize(infile), st_blksize(outfile))

- at least as far back as 1996 : max(st_blksize(infile), st_blksize(outfile))

(In my psuedo-code, `st_blksize(fd)` is the `ST_BLKSIZE(buf)` of the result of `fstat(fd, buf)`.)


Right, and the parent comment is saying that `cat` doesn't figure it out very well.

I recall that GNU cat used a constant value that was tuned for read performance on common hard drives, giving no consideration to write performance. Looking at the GNU cat sources today, it doesn't look at all how I remember; I'd have to study it a bit to tell you what it does.

Edit: Hrmm, it seems I was mis-remembering. Perhaps I was thinking of the minimum values (32KiB, 64KiB, 128KiB below)? Or perhaps I was thinking of a BSD cat? Anyway:

How GNU cat sizes its buffers, by version:

- >=9.1 (2022-04-15) : Use the `copy_file_range()` syscall and let the kernel figure it out

- >=8.23 (2014-07-18) : max(128KiB, st_blksize(infile), st_blksize(outfile))

- >=8.17 (2012-05-10) : max(64KiB, st_blksize(infile), st_blksize(outfile))

- >=7.2 (2009-03-31) : max(32KiB, st_blksize(infile), st_blksize(outfile))

- at least as far back as 1996 : max(st_blksize(infile), st_blksize(outfile))

(In my psuedo-code, `st_blksize(fd)` is the `ST_BLKSIZE(buf)` of the result of `fstat(fd, buf)`.)


No, I was talking about dd. As far as I know it does not have any smart tuning of block sizes. It definitely didn't in the past and I doubt it was added.

Not sure what cat does but like I said in my other comment its not really cat itself that does the disk writing in that command but rather the shell redirect. Edit: Nope, I'm wrong there!!


Ah, I misunderstood you. You're correct, `dd`'s default block size was and is 512, which is specified by POSIX.

I interpreted "the command" in the original article as "the command you end up running, whether it be `dd` or `cat` or something else."

But you're mistaken about the shell redirect. In either scenario it's the command (`dd` or `cat`) making the write() syscall to the device. The shell passes the file descriptor to the command, then gets out of the way.


Indeed I was mistaken about the shell, sorry about that. The redirect method I forgot a really long time ago and my memory made it into something it was not.

In that case I guess dd does call a sync() on every output block? Because it's definitely slower and the LED pattern on a USB stick is also much more 'flashy' when using 512 bytes.


> In that case I guess dd does call a sync() on every output block?

Only if you tell `dd` `oflag=sync` or `oflag=dsync`.


For context, choosing the block size has implications on the speed at which data gets written to a USB stick, so not being able to tune that on cat can be a problem.

I guess that made sense when the standard HDD block size was 512b. Then it went to 512/4kb then pure 4kb, not sure what SSD's do maybe also 4kb? Experimenting with speeds and block sizes in the past has shown very quickly diminishing returns increasing the block size. As long as the CPU can keep the queue depth over 1 the device should be flat out.

Generally they do a logocal 4kb, but it's usually physically larger pages of more than 1M per write. a good ssd will helpfully cache and batch up small writes but if it gets it wrong then it'll amplify the wear and kill a drive quicker than needed. That's another reason to do dd with a larger block size, since it'll make it a lot less likely that you write multiple blocks for a single update

Good point. I guess there's so much going on with various types of caching and wear levelling that 'let the device figure it out' is best. And the queue can be on the device now with NVME not on the host so its not a dumb queue any more.

None

well so does dd without the conv=fdatasync option


I can confirm I've seen dramatic speed differences with block sizes in dd. I haven't tried comparing with cat, though.

I don't have data at hand but if you choose the right value it's meaningfully better and can also lead to more efficient patterns in bash scripts that are more complicated than `cat in > out`

Unfortunately dd has not just footguns but foot cannons that are amplified by the mistakes people often make with string escaping, odd file names, loops, null checking, and conditionals in bash.


> commits every block individually

It uses a syscall per block write (which would slow it down if you use 512 byte blocks instead of 8M for example), but the OS does the file buffering and final writing to the device, unless you pass the fsync or fdatasync options to dd.

Edit:

here's writing to a old and slow stick and you can see that dd is fast and then the OS has to transfer the buffered data onto the stick:

  dd if=some-2GB-data status=progress of=/dev/sdX bs=8M
  1801256960 bytes (1,8 GB, 1,7 GiB) copied, 7 s, 257 MB/s

  0+61631 records in
  0+61631 records out
  2019556352 bytes (2,0 GB, 1,9 GiB) copied, 378,59 s, 5,3 MB/s
And the stick is placed in a USB 3.2. port on a fast machine ;-0

Can also do `oflag=direct` and it'll just skip as much of the caching as it can.

> Can also do `oflag=direct` and it'll just skip as much of the caching as it can.

Correct, and that's one more point for dd compared to head/tail (which are fine commands by themselves).

But wouldn't help much in my example, where I used an (very) old "high speed usb 2.0 stick" with 4 MByte/s write speed to demonstrate the difference between buffering and actual writing.


Thats when it matters, with cat it would cache the entire write. If you don't want it to cache everything and have no idea how much time is left or how fast it is writing, and then wait an hour for it to unmount instead then cat is fine.

Another one is the conv=noerror read and fill bad blocks with zeroes option.

If your goal is to recover as much data as possible then something like GNU ddrescue would be more appropriate.

You can also edit text files using notepad. Does that mean there is a cult around the useless vi?

Literally my thoughts too!

"Who cares?" --> people that care are people that want to have control over that option, as not every tool is written intelligently.


Indeed. dd can also serve also as an ad-hoc performance test tool (it gets a bit tricky when you need multiple threads/streams though).

Dd is super useful for incrementally writing boot loader data to /dev device endpoints when offsets are required (multiple images).

This is very common with u-boot and eMMC/microSD devices on SBC (pi clones, etc).

It’s also generally valuable to be precise and explicit when messing with boot loader data in general.

I think the author is under appreciating these use cases.


# cat file.img > /dev/sdaX

works, fyi



Um. Does it? The stated usecase was:

> Dd is super useful for incrementally writing boot loader to /dev device endpoints when offsets are required (multiple images).

How exactly does one use cat to overwrite only a specific offset? Say, I have a 512 bootloader that needs to be placed starting at 2048 bytes into the image; how would I invoke cat to do that?


No it doesn't, what's being done here is inserting a binary block into the middle of a binary file (or disk) without overwriting the data before or after it

Yea I came here to write the same. The author obviously works in a different environment where such operations are not standard.

The ability to specify that an exact amount of bytes should be written or read in a location is an important task. More so on special files, like device files. Even provisioning an SD card with the image block aligned is crucial, and that is not speaking of very early boot loaders that need to be available in a specific byte offset.

This discussion hopefully educates the public that this kind of tools are foundational and their use cases should be learned:

You never know when you need to read a special byte offset in a hacky one liner in a debug environment. Analysis productivity matters, and knowing analysis tools is key for that.


The major feature of dd are skip=N and count=M parameters. Combined with ibs=1 you can slice the file into any parts in your script as needed.

cat can't do that.


As mentioned elsewhere in this post this leads to dreadful performance. You are better off using iflag/oflag options seek_bytes, skip_bytes and count_bytes to treat these options as bytes and using a large block size.

Hm, I don't see any details on that in the man page. What's the logic there?

skip, seek and count default to counting the number of blocks. So if you want Megabytes 50-60 of a file you can do either a 1 byte block size:

> dd if=/tmp/file1 bs=1 skip=50000000 count=10000000

or you can use any block size you want and treat skip and count as byte counts, not block counts.

> dd if=/tmp/file1 bs=4M skip=50M count=10M iflag=skip_bytes,count_bytes

Option 1 will run at Kilobytes per second as it is transferring 1 byte at a time (my test gives me 933kB/s)

Option 2 will run a 100's MB per second as it is transferring 4MB at a time (my test gives me 202MB/s).

Edit: What I should have said at the top of the post was "If you want to extract data from a file whose size is not an integer block multiple". In the above example you could have used a Blocksize of 10MB for the same result. You cannot do this for an oddly size extraction (say 1234567bytes).


Interesting, thanks for the hint!

Why doesn't the man page say anything about these flags? They are completely obscure becasue of that.


I only found out because I use dd 1000's of times each day and have had to prototype some really esoteric data pipelines using it. I have even compiled a few custom versions to do various tasks.

One other nuggets of wisdom which is very poorly documented in the man page: if you have a long running dd process (I often have 12+ hr dd processes reading 10+TB tapes) you can send a USR1 signal to check the progress. Very useful to check if your tape drive has had a hardware failure.

Information on both this and the byte counts are in my man page (GNU coreutils 8.30 on Linux Mint 20) around lines 130.


I knew about USR1 and it's in the man for me too, but I don't see anything about skip_bytes / count_bytes flags, which is strange.

Debian testing, coreutils 9.1.


I am running Coreutils 8.3 on Linux mint 20.

It is also in Coreutils 8.22 in RHEL7.

https://man7.org/linux/man-pages/man1/dd.1.html

Edit: I just found this in the coreutils 9.0 changelog:

  dd now counts bytes instead of blocks if a block count ends in "B".
  For example, 'dd count=100KiB' now copies 100 KiB of data, not
  102,400 blocks of data.  The flags count_bytes, skip_bytes and
  seek_bytes are therefore obsolescent and are no longer documented,
  though they still work.

Thanks for the pointer.

So you can simply do something like:

   dd if=foo skip=nB count=mB bs=4M of=bar
To get m bytes from offset n

> more standard Unix tools

Sure sign that the post wasn't researched at all, since dd is one of the very oldest UNIX utilities and is even in the POSIX standard.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/d...

The "pv" command they recommend instead is, by contrast, not standard and is nowhere near universally available. Also, as just about any not-a-noob knows, the author's "cat" suggestion will not work on devices that expect whole blocks, except perhaps by accident ... and it's not a good habit to rely on accidents. Options like skip and seek also allow you to do things like write a single block at an arbitrary location, which can be very useful sometimes and is something cat/head/etc. can't do.

Reacting to anything one doesn't understand, without even trying to understand it, as strange or obscure or weird (all their words) is a very deeply incurious attitude. That kind of crap doesn't even belong here, per guidelines.

P.S. Hey kid, learn how to use a proper link instead of a Wikipedia title.


I think that the article meant "standard" in the colloquial sense, not in the POSIX sense.

To the extent that "standard in the colloquial sense" even makes sense, it's still wrong because "head" (the proposed alternative) is no more common or well understood than dd, and "pv" is far less so.

That probably depends on the user. I use head every day, I can't think of the last time I used dd. Agree with you on pv, I don't believe I've ever used it - I've seen it around occasionally but it's never top of mind the rare time I might want it

Using standard as an adjective to mean "used or accepted as normal or average" or "usual, common, or customary" makes a lot of sense. I just came back from touring my buddy's new woodshop. When people ask how it looks I will say "he has your standard set of woodshop tools." Nobody is going to wonder if it's an ISO, or ANSI standard set of woodshop tools.

If we go by your definition, then 'dd' is absolutely "standard", and 'pv' absolutely is not.

The point here is that dd is the weird one out that has a different command line syntax than everything else. Not about whether it's installed by default. (Let's not get distracted by the pv bit; dd's status=progress isn't POSIX either)

I don't believe it's that. On macOS Monterey:

  $ which pv
  pv not found
I've been using Unixes of various flavours since the 1980s and had never heard of it until I read this article. It's certainly more obscure to me – as in not discoverable at all – than dd status=progress.

Apparently I've been slumming it by watching long jobs using Activity Monitor on macOS.


I don't have 'pv' installed on my Debian system, and had never heard of it before. 'dd' is installed by default and I've been using it for decades.

the cult of cat? the cult of pv? Really, we just need a progress meter that says "don't bother getting up up", "get lunch now", or "come back tomorrow", but I don't know that there is one.

For bulk copying of /dev/zero to /dev/null, `dd` is fastest (block sizes of 256kB up to 4MB are about the same speed), then `pv` (block sizes of 256kB up to 4MB are about the same speed), then `head`.

But it's true, unless you're dealing with an actual tape drive, there aren't a lot of things in the world anymore that depend on a specific block size; the idea that you'll get a faster transfer if you (say) exactly match the erase block size of your SD card doesn't seem to hold much water anymore.

And debian's bash-completion of `dd` has been broken for several releases, I despair of it ever working right again.


> Really, we just need a progress meter that says "don't bother getting up up", "get lunch now", or "come back tomorrow", but don't know that there is one.

It feel like this is gonna lead to another thing like thefuck...


mbuffer does the job

> we just need a progress meter that says "don't bother getting up up", "get lunch now", or "come back tomorrow", but I don't know that there is one.

Doesn't `status=progress` work? (The "obscure option to GNU dd" that the article mentions)


On BSD systems you can press ^T for SIGINFO and that prints some progress information, at least for tools like dd and cp, not sure about cat and I don't have a BSD system to test. Linux doesn't have this unfortunately. GNU tools often support it (when run on BSD), it's just that the kernel doesn't.

On Linux you can do a 'kill -USR1 $(pidof dd)' to have dd print a progress report.

> The idea that you'll get a faster transfer if you (say) exactly match the erase block size of your SD card doesn't seem to hold much water anymore.

No but if you use a much lower block size with cheap flash it does get horribly slow.

Makes sense because cheap eMMC does not have cache RAM to combine write commands and no NCQ (native command queueing) either. So it has to execute each write as it gets it. I bet you can kill flash pretty quickly this way with a block size of 1. The write amplification will be huge.

The problem with the cat method is that you don't really know what it's doing under the hood in terms of write sizes. Probably it will pick something smart but it depends on the OS there and possibly the shell.


Assuming you're piping between programs, you could try pipeviewer:

https://www.ivarch.com/programs/pv.shtml


1) cat feels like text/line oriented program. It is not obvious that it is even supposed to work efficiently on binary data.

2) (GNU) dd has support for progress bar and even if not enabled, one can get statistics by sending USR1 signal to it.


> (GNU) dd has support for progress bar and even if not enabled, one can get statistics by sending USR1 signal to it.

And the BSDs have siginfo, so at least most of the Free OSs are covered.


I forget which way it goes, but if you accidentally send an INFO to non-bsd dd (or maybe it is USR1 to bsd dd) it conveniently dies.

I believe the way it goes is that USR1 kills BSD dd.

The Linux kernel doesn't have SIGINFO. GNU dd uses SIGINFO on platforms that have it, or SIGUSR1 otherwise. The default action for SIGUSR1 is to kill the process. So it makes sense that on platforms that do have SIGINFO no one would bother to override that default SIGUSR1 behavior.


I think it's USR1 to bsd dd? SIGINFO doesn't exist on Linux from what I remember, but I think the default behavior for processes is to terminate when they receive a signal they don't catch, and I can't imagine many programs have explicit handlers for USR1, especially if SIGINFO is an option

The default handler for SIGINFO is a non-fatal process summary, like this (from FreeBSD):

    load: 0.15  cmd: sleep 52109 [nanslp] 0.27r 0.00u 0.00s 0% 2132k
    mi_switch+0xc2 sleepq_catch_signals+0x2e6 sleepq_timedwait_sig+0x12 _sleep+0x1d1 kern_clock_nanosleep+0x1c1 sys_nanosleep+0x3b amd64_syscall+0x10c fast_syscall_common+0xf8
SIGUSR1's default is to terminate the process, which makes it awkward to use as a SIGINFO alternative on platforms without it.

> (GNU) dd has support for progress bar and even if not enabled, one can get statistics by sending USR1 signal to it.

The OP knows that. The article says that with pv, you have something that works for every command instead of needing to remember dd-specific syntax.


I find it very strange that you think it's not obvious that cat works with binary data. It's just a "dumb" program that reads from some files and sends it to stdout, it doesn't care what the data is!

One of the intended use cases of cat is reassembling files after they have been split with the "split" command!


I just recently used dd to binary patch some files. I needed to read n bytes from offset p in file1 and write them to offset q in file2:

  dd if=file1 of=file2 bs=1 skip=p seek=q count=n conv=notrunc
Very useful.

Same here, a few times over the years. Another useful trick is piping input from echo to xxd -r -ps to dd so you can specify hex bytes directly in the echo arguments rather than reading input from a file. Quite handy if it's an embedded system that you can't easily transfer files to, but can get a shell on.

Ha! _Very_ cool. Thanks for sharing!

Recommend gzip -9c file|base64 - W$COLUMNS and then base64 | gzip - d>file if you are going to transfer files by copy and paste.

That's going to be very slow though, as it will read and write everything one byte at a time. The problem of course is that you can't specify seek/skip/count values as bytes unless your block size is 1 byte.

If you don't need seek then you can at least use ibs=1 instead, as skip/count operate on input blocks, but this will still read one byte at a time even though it will aggregate the reads into larger output blocks.

It would be really nice if we had a dd2 tool that offered similar options but defaulted the block size to a new "auto" value, treated seek/skip as byte values, and treated count as a byte value if the input block size is "auto".


Ah good point. In my case it was only 4 bytes, so no issue with slowness.

dd does all you want without a new version.

dd allows you to specify your seek, skip and count values as bytes instead of blocks using the iflag/oflag options seek_bytes, skip_bytes and count_bytes.

So to read the first MB of data 100GB into a file you can:

> dd if=/tmp/file1 bs=2M skip=100G count=1M iflag=count_bytes,skip_bytes


Which 'dd' is that? Coreutils 9.1 on Debian testing here, and the 'dd' manpage doesn't metion count_bytes or skip_bytes at all.

I am running Coreutils 8.3 on Linux mint 20.

It is also in Coreutils 8.22 in RHEL7.

Edit: https://man7.org/linux/man-pages/man1/dd.1.html

Edit 2 : I just found this in the coreutils 9.0 changelog:

  dd now counts bytes instead of blocks if a block count ends in "B".
  For example, 'dd count=100KiB' now copies 100 KiB of data, not
  102,400 blocks of data.  The flags count_bytes, skip_bytes and
  seek_bytes are therefore obsolescent and are no longer documented,
  though they still work.

I rarely use dd, but my most common use might be:

    blah | sudo dd of=/some/file
where:

    blah | sudo cat > /some/file
wouldn't work.

    blah | sudo tee /some/file > /dev/null
would though. Both are probably fine.

If 'blah' outputs binary data, then that would work, but would spew garbage to the terminal. Piping to 'dd' is probably more efficient regardless.

But this one would work:

blah | sudo sh -c 'cat > /some/file'


There's no need to pipe via pv when you can just use dd's status=progress option.

Or if you forget that, you send the dd process a USR1 signal and it will print the status.

Though if you’re simply initializing from /dev/zero you’re probably better off with truncate to create a sparse file: https://man7.org/linux/man-pages/man1/truncate.1.html

Can’t beat a program that doesn’t even write the bytes.


Or `fallocate` if you care about the space actually being allocated on disk (rather than the file being sparse).

One of the nice things about `dd` is that the arguments can be in any order, they're explicitly given mnemonics. You can have `cp` ordering with `dd if=<source> of=<destination>` or `ln` ordering with `dd of=<destination> if=<source>` and it doesn't matter, `if` is always the input file and `of` is always the output file. The other POSIX tools don't all have consistent syntax anyway, `dd` is at least more memorable than `find` or `tar`.

Actually, ln takes arguments in the same order as cp and mv. It's always `cmd existing-file... new-name`.

ln only gets confusing because the file that exists is the target of the link, so it may be natural to think of the link as going from the last name on the command line to the earlier ones, but really the mental model should be in names that either exist or are to be created.


True, if you develop the mental model you described, instead of the `cmd source destination` model.

The `help` text for GNU coreutils `cp` starts with `Usage: cp [OPTION]... [-T] SOURCE DEST`.

The `help` text for GNU coreutils `ln` starts with `Usage: ln [OPTION]... [-T] TARGET LINK_NAME`.

The man pages & help text aren't conducive to building the `cmd old new` mental model. `dd` makes it more explicit.


To me dd's noerror flag as well as seek are very comforting to know of whenever I'm on a quest to preserve some precious data from any medium currently in failure mode ... any extra read might be your last.

dd is a beautiful little swiss army knife and I'm not ashamed to be part of the "cult" lol. I have had to use ddrescue too though. It's handy to have in the toolbox if you have a disk with errors and want to get data off of it.

But how to reconcile this advice with the "unnecessary cat" zealots. Just let me live my life guys.

Just use pv, you don't need cat with it like shown in the article:

    pv some.iso >/dev/sdb
Or if you don't care about progress bar, you can use cp:

    cp some.iso /dev/sdb

I didn't check if it works for writing to a volume, but wouldn't this simple input work?

  < some.iso > /dev/sdb

Redirection isn't very useful here without a command. Put a 'cat' in front of it, for example.

Have you never tried it?

    % cat > foo
    fred
    barney
    wilma
    betty
    % < foo > bar
    % cat bar
    fred
    barney
    wilma
    betty
    % diff foo bar
    %

interesting! TIL

If you look at https://news.ycombinator.com/item?id=33334043 it turns out rascul was right and it appears to be only zsh that handles the reading of the redirected input without some other command to hand it off to.

I saw all that. Still, was interesting.

I just happened to explore this on an M1, so I had a shell that would do things this way.


I wasn't able to do that with bash. Not at the computer now though so I can't try it again to see if I had screwed it up.

Update: Back at the computer now, and I cannot replicate it with bash:

    rascul@smarts:~/mm> cat > foo
    fred
    barney
    wilma
    betty
    rascul@smarts:~/mm> < foo > bar
    rascul@smarts:~/mm> cat bar
    rascul@smarts:~/mm>
I'm curious what shell you did that with.

That's in zsh. Yeah, I guess there are shells where it won't work.

I think it's a zshism. I can't seem to do it in dash either.

I've since tried in two versions of Bash (3.2 and 5.1) and in five different varieties of ksh (oksh, mksh, ksh93, pdksh, GNU ksh). It does seem to be a zshism. All the others do the redirect into the destination, but only zsh provides the content of the redirect from source without some other command to read it.

The zsh way seems more properly fitting with the philosophy of the Unix shell to me. It would be an uphill battle getting everyone else to change that behavior, though.


I've been using this for a long time now and I didn't know if what a zshism because it felt so simple to use and Unixy, TIL too, thanks!

> This is a strange program of [obscure provenance] ... that somehow, still manages to survive in the 21st century.

Perhaps it "survives" because people have been using it for decades for specific tasks for which it still still works just fine. Just because there are other ways to achieve the same results for some of the tool's use cases doesn't mean that said tool should be done away with. If you don't like it no one is forcing you to use it, and calling those who still use dd "The Cult of Dd" is just ridiculous, in my opinion.


For one, dd is a hell of a lot less esoteric than what this blogger is suggesting

since when did dd need a block size?

This might be a GNU vs BSD thing. But even if dd doesn't need a block size, sometimes (at least in the past) the default block size is not optimal.

I read the title as Cult of the Dead Cow and had a flashback to my misspent youth.

CDC was maybe more worthwhile than other childhood mischief.

Seeking a file descriptor is a basic, useful operation, and dd is the only POSIX tool that can seek an fd without actually paying the read / write costs.

Suppose you have 2 hours of UHD RGBA32 video, and need 5 minutes of footage from the halfway mark:

  dd if=in.raw bs=$(( 3840 * 2160 * 4 * 24 )) skip=3600 count=300 | ffmpeg ...
This will be a lot faster than pointlessly catting the first 3 terabytes!

Here's a variation. Suppose the video file has a 1M header you need to skip:

  { dd bs=1M skip=1 count=0; dd bs=796262400 skip=3600 count=300; } < in.raw | ffmpeg ...
The first dd invocation does nothing more than seek stdin ahead 1M, so the second can operate on full 1 second chunks of video. Useful!

Now, are there Uncalled For Uses of dd, just like there are Useless Uses of cat? You bet.


tail can seek without reading too, but only on regular files

Can tail seek without then writing the rest of the file to stdout?

Well if you want to limit amount of data you are getting from a command you need `head`

tail -c +STARTOFFSET $FILE | head -c MAXLENGTH

There are more than one way to do stuff with UNIX tools.


does the head call attached to the pipe stop tail from reading the rest of the file after STARTOFFSET+MAXLENGTH?

Yes, head exits when it's output enough. This causes tail to also call it quits as it has nothing to write to.

It does. tail will get SIGPIPE and exit. https://stackoverflow.com/q/8369506

But tail might read up to pipe buffer size more data than is actually required. Also tail + head approach have an overhead of copying data between processes.


That’s why tail | head isn’t a reliable way to seek an fd — it will seek past the desired offset by reading and then failing to write up to PIPEBUF bytes.

ffmpeg is your friend for media files, in case anyone gets ideas.

If the specific job can be done with dd with your eyes closed, why would you spend 40 minutes reading the ffmpeg man page ...

Because it takes 5 mins at stackoverflow, and the result more likely to be usable.

Also, as in your example, it's a million times easier to make bs your frame size so you can use skip and count to slice frames as you wish. I use this all of the time working with raw video files.

Why not use a video native tool like ffmpeg ? I'm unclear on what the advantage of dd is here.

You could seek to second, keyframe, etc, and it would continue to work for formats that don't have fixed frame sizes.

It's true that it is quite a lifehack if you often seek to frame in raw though.


The advantage of dd in this case is that dd is designed for exactly this use case of having a file (or character device, ie. magnetic tape) with some kind of fixed-size records.

Because it's bespoke raw video, ffmpeg wouldn't know what to do with it. Sure if I'm chopping up an MP4 then dd isn't going to 'cut' it ;P

Not to detract from the point, but if you are using ffmpeg to process raw video files, you can just use the "-ss" parameter on input and "-t" on output to get what you want. It'll seek and avoid unnecessary reads.

  ffmpeg -f rawvideo -r ntsc -s 1920x080 -ss 3:00 -i some_file -t 5:00 output_file

A weird personal discovery was that `openssl x509` used as a conversion tool actually reads the first PEM certificate and leaves stdin open and positioned just past it (rather than silently discarding any trailing data the way I always thought it did), so if you pipe or redirect a bunch of certs into a shell loop (as opposed to a command inside the loop condition) you can actually decode them one by one.

Without the direct flag dd is pretty pointless. However, with the direct flag, you do un-bufferred I/O with precisely controlled block size. It the fastest way to ensure the data really go where you want it to go, in the fashion you like it to be, right at the moment.

Right!

Apparently Linux has a facility for 'optimal IO size' of block devices (see 'lsblk -o NAME,OPT-IO') but on my system I only have a value for Linux md RAID devices (which happen to be RAID0, and the OPT-IO value is the stride). All of the other devices have OPT-IO 0.

Perhaps more work needs to be done to bubble some value up from the hardware.


One fun feature I like in dd is the ability read/write from block offsets.

Need some random data quickly?

dd if=/dev/urandom count=1024 bs=1 | base64

Need it to be in a particular subset of characters?

dd if=/dev/urandom count=1024 bs=1 |tr -dc 'A-Za-z0-9'

Note: if you're on a mac, its tr is kind of broken. Do export LC_ALL=C first.


I find head easier for these, replace the first part of your commands with

head -c 1024 < /dev/urandom


The shell redirection < can be removed too, since head supports specifying files [from manual page HEAD(1)]:

head [OPTION]... [FILE]...


That just prints a bunch of garbage characters for me. I tested it on mac and Ubuntu xenial. Maybe it works differently elsewhere.

You still need to pipe through base64, i.e.

head -c 1024 /dev/urandom | base64


You can specify LC_ALL= on the same line, right before tr. Here's another example of a password generator (without dd):

  LC_ALL=C tr -dc 'a-zA-Z0-9-_\$' </dev/urandom | fold -w 20 | sed 1q

I'm fond of ddrescue, and I'll note pv could have replaced cat in one example.

Yeah I usually just use that for any dd needs, and it has progress bar by default

I almost always use ‘dd’ even though there’s better options because it plays friendly with sudo, unlike pipes and io redirects.

Can you even use ‘>’ io redirects with sudo?


echo hi | bash -c 'cat >/tmp/hi'

Use tee instead of a redirect?

Meh, "cult" is mean and unnecessary. So people use it as a swiss army knife? Does it fail? bs=4M is useful will work fairly universally on various systems +/-10% variance of time. It's always nice to know new ways but why insult people who like dd?

This thing pops up from time to time here and I'm still trying to figure out how to emulate seek and skip options with other posix commands.

Some options for dd do matter. I have seen it substantially increase throughput While mirroring disks. Playing with the block size can make a large difference (disk block size alignment).

I'm not sure how you can seriously advocate for ditching 'dd' by calling it "obscure"

I had an SSD start failing writes, which quickly made it fail fsck, so I couldn't boot. I rescued pretty much all the data that mattered with dd. The arbitrary offset for starting the copy was essential, because the failing SSD would occasionally fail reads in the middle of copying.

That would be a good use case for ddrescue, which attempts to deal with bad blocks. I actually usually just wind up using it instead of dd for dd-like needs, as it has progress by default.

Dd is the only tool I know to make sparse disk images.

Eg:

Dd if=/dev/zero of=image.img bs=1k count=1 seek=999999

Will give you a 1gb file of zeros that takes up 1kb of disk space until you start to write to it, at which point it will transparently “expand on write”.


But you could use truncate to create the file.

There's probably a way to do that with fallocate, I see it has a -d option to make "the file sparse in-place, without using extra disk space".

    truncate -s 1g image.img
Or fallocate if you want it to complete instantly without disk writes but still allocate the space.

My favorite dd(1) story:

Back in 1979 I was an intern at NBS (now called NIST), in their computing standards division. Amongst other machines, we had a couple PDP-11 systems running v6 Unix (it was glorious).

The smaller system had removable 5MB disk packs, and we did periodic backups with dd to a spare disk. It was horribly, horribly slow, and we lived with it for months until someone realized that

    dd bs=10 ...
wasn't copying ten blocks at a time, it was copying ten bytes at a time. Whereupon backups got LOTS snappier.

(That was a fun position. One of the resident Unix gurus handed me a copy of Lion's notes on my first day and basically said, "Read this and see me later", and I got to maintain and build out a whole lab full of microcomputers).


Chesterton's Fence:

There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.”


On the other hand, often the fence made sense before the building next to it got demolished decades ago and nobody remembers that there even was a building.

Ideally you'd want to discover that the building was there, then confidently tear down that fence, but having a fence that nobody dares to touch because the building has been forgotten and we can't find why the fence was even there isn't great either.


Why dd and not pv? dd is bundled on most distros, while pv you need to install it if you want to use it. You can't always just go and install any packages you want on production servers.

While importing a DB dump is easy to see the progress with dd.

dd if=my.sql status=progress | mysql mydb


Discussed at the time:

The Cult of DD - https://news.ycombinator.com/item?id=13896675 - March 2017 (171 comments)


every command listed in this article is bad.

    # Cat version with progress meter
    cat image.iso | pv >/dev/sdb
The progress meter here is mostly meaningless. You'll see the initial progress go very quickly (because you're writing to in-memory cache), and once it's "done", you'll have to wait some amount of additional time for a final `sync` to complete (and if you forget to do that, you might remove the drive while writes are still in progress).

The best way to write an image to a drive is like so:

    dd if=foo.iso of=/dev/sdx bs=4M oflag=dsync status=progress
`oflag=dsync` bypasses the write cache, so your progress bar is actually meaningful. It also guarantees that writes are actually completed[1]. Yes, that 4M block size may be improved by manual tweaking, and it would be nice if that happened automatically. I'm sure tools to do this exist, but they're not installed ubiquitously by default. Older versions of dd do not support `status=progress`, and as a workaround you can do:

    pv foo.iso | dd of=/dev/sdx bs=4M oflag=dsync
(alternatively, you can set up a bash for-loop that periodically sends SIGUSR1 to dd)

[1] Unless the drive has onboard dram cache etc., but this is rare for removable media

P.S. If you use "/dev/sdx" in example commands, it will fail when someone blindly copy-pastes without reading anything, instead of erasing their whole OS


I've started to do this as well. BSD has SIGINFO (^t) for dd status, but unfortunately linux lacks that

GNU dd uses SIGUSR1 on Linux. Yes, it's unfortunate for people who move back and forth between them, especially when newer to one or both.

I use pv directly to write files to devices using

    pv <image.iso >/dev/sdb

> The progress meter here is mostly meaningless. You'll see the initial progress go very quickly (because you're writing to in-memory cache), and once it's "done", you'll have to wait some amount of additional time for a final `sync` to complete (and if you forget to do that, you might remove the drive while writes are still in progress).

A bigger reason to avoid `dd` is unintuitive, "incorrect" behavior in some edge cases.

I don't remember the exact conditions that trigger it, but `dd` without `iflag=fullblock` can result in

    dd: warning: partial read (16384 bytes); suggest iflag=fullblock
I'm able to reliably trigger this with

    $ cat /dev/zero | openssl enc -aes-128-cbc -pbkdf2 -k foo | dd status=progress bs=1M count=100000 of=/dev/null
but not when I omit the `count=...` for some reason (maybe it isn't showing the warning in that case because it doesn't matter - apparently the effect this has is one of the "blocks" being smaller, and thus fewer bytes being copied, but it doesn't add padding or anything stupid like that, see https://unix.stackexchange.com/questions/121865/create-rando...).

I wish we had a cat-like tool for writing into files, for the "cat foo | do-something | sudo dd of=/dev/something" use case.


For writing into files I usually reach for “sudo tee /dev/something” but there is also this:

    … | sudo cp /dev/stdin /dev/something

Huh. That seems to work. I would have expected that to create a copy of the device file, i.e. /dev/something becoming synonymous with /dev/stdin.

`tar` used to control tape drives too, not news.

`dd` has purposes, so the author must not do anything real:

0. Clearing volume tables

1. Creating files of fixed sizes

2. Varying block sizes

3. Partial reads

4. Unaligned reads

5. Differing block sizes

6. Character set translation

If coreutils' `cp`, `mv`, `dd`, and their friends were modified to support `sendfile(2)` on Linux, these would then use `splice(2)` zero-copy kernel transfers under-the-hood. Already, they're likely hitting caching at some layer but there are advantages when copying between 10+ GbE network devices and/or tmpfs. Furthermore, the major shells `bash`, `zsh`, `fish`, `busybox`, `dash`, and `tcsh` would be advised to use `sendfile(2)` where possible. There's really no need to play bucket-brigade "fire drill" with data when there's a firehose that's free.


Clickbait. Right tool for the job, people.

The article could have been shortened to "dd is old and I don't like the syntax, so you should stop using it"

> its highly nonstandard syntax

Other than the missing dash/double-dash, what exactly is "highly nonstandard" about `dd`?

"f" has been a shortcut for "stream" for decades

    fprintf
the `arg=value` syntax is common to many command line tools

    grep --color=auto
; and i/o are literally the abbreviations for, well, Input/Output. So we have "input-stream=" and "output-stream=". How is that "highly nonstandard"? Because the dashes are missing?

> But otherwise, try to stick to more standard Unix tools.

`dd` is literally part of POSIX, it doesn't get more standard than that;

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/d...


I hope this does not end up being the beginning of the end for dd out of misguided minimalism (as with egrep and fgrep). Skipping input / output data is of course one of its main uses.

> There's an obscure option to GNU dd to get it to display a progress meter as well. But why bother memorizing that? If you learn the pv trick once, you can use it with any program.

On macOS and BSD’s it’s as easy as pressing ctrl-t to send a SIGINFO signal to have dd show the status. On Linux it’s a tiny bit more complicated indeed. But knowing how to send signals to running processes is also a trick valuable of learning.


The author neglects to mention any of the use cases where "dd" really is the best alternative. Such as when you need to copy a bootloader from an existing image without damaging your partition table:

# My partition table is in sector 0 and my Linux partition starts at sector 62500

dd if=random-linux-image.img of=/dev/mmcblk0 bs=512 count=62499 seek=1 skip=1

How do you make "cat" seek 512 bytes into its stdout before writing? You can't. And even if you used a temporary file, I think the above is better than a pipeline of "head" and "tail" with uses of $(()) to convert from sector to byte units.

Once you have learned to use the tool in such situations a few times, you mentally start to associate it with "disk operations". So you start to use it also in simpler situations where simpler tools would be sufficient. You start teaching its use to others. I don't think it's that strange, or bad in any way.


You can use head as stated in the article:

sudo dd if=/dev/sda of=/tmp/boot.1 bs=512 count=1

sudo head -c 512 /dev/sda > /tmp/boot.2

md5sum /tmp/boot.1

06d6f2aa3e7c33ad06282f70aa4e133b /tmp/boot.1

md5sum /tmp/boot.2

06d6f2aa3e7c33ad06282f70aa4e133b /tmp/boot.2


You missed a nuance in the GP's problem set:

> dd if=random-linux-image.img of=/dev/mmcblk0 bs=512 count=62499 seek=1 skip=1

is not the same as:

> sudo dd if=/dev/sda of=/tmp/boot.1 bs=512 count=1

You are missing the "seek" and "skip" arguments which the GP challenges you to make `head` do.


You right. To skip you can pipe tail.

sudo head -c 1024 /dev/sda | tail -zc 512 > /tmp/boot.2

Read 1024, keep last 512


What if the position is not at 512 but at 512G?

Not tested, by I think the performance will be in the same order. Would be nice to do a benchmark test.

No, it is O(n) vs O(1). Why would you read 512G + 512 bytes if you only need to read 512 bytes at offset 512G?

I think you can just invert and do tail first

sudo tail -zc +513 /dev/sda | head -c 512 > /tmp/boot.2


Well now you're seeking to the offset (tail is).

That "sudo head" only works for reading, not writing though.

I use dd when redirect doesn't have permission.


I've seen tee used for that, but I myself stick to dd (out of habbit or because I belong to said cult).

A generous interpretation of the title might be that the author is only writing about those for whom dd is so ingrained in their style that they use it where another command would be more natural, hence they're "cultish".

dd can deal quite well with bad sectors (as zero fill them)

This post is just pure ignorance of what dd is about. Try messing around with operative systems and not just being the cool guy with a terminal and you'll see that dd is useful.

dd is pretty standard if you know it. Just because it isn't a standard to you, doesn't mean it isn't for everyone.

There are a lot of commands I don't use regularly that I saw once, which appeared strange to me. There were different way to do what tey did. Yet I didn't feel compelled to write a salty blog post about it.


I think the authors point is that dd is a specialised command, and for most common tasks (where the precise details don't matter), it's often not the best tool for the job. I didn't come away with the impression that you should never use it.

is there the need for a blog post saying a command has a specific usage given how the unix philosophy is exactly to have many composable specialised commands ?

People can blog about whatever they want. I think people on HN and other platforms often forget: Just because the content ended up here, doesn't mean it was written for here.

Good point. Alas I sometimes find highly upvoted posts not that interesting, like this one. Almost as if there was some kind of bot, or if people upvote uninteresting stuff sometimes. Whichever best fits Occam's razor

with the unnecessary cat he's limiting his read()/write() sizes to the pipe size (default on Linux is 64k)

Hmm, dd is some kind of the shell user's swiss army knife and sometimes even the swiss army chainsaw ;-) and such tools need a bit of learning to use them properly. actually, this "swiss" metapher is not fair, dd is more similar to a precision tool.

You read about these fake SD cards or USB sticks on sale, where the controller asserts a much larger memory size than the card actually has? You can check such cards with dd, e.g.:

  for i in $(seq 1 9999) ; do echo -n "$i "; echo "Record Number $i" | \
    dd ibs=1K count=1 obs=256K seek=$[i-1] cbs=4K copy=block,sync of=/dev/sdX
    if [[ $? > 0 ]] ; then break ; fi
  done
This writes numbered and blank padded blocks (due to the cbs=...) at the start of every 256K block of a device. Adjust your parameters to taste and match the purported size of your card or stick and later retrieve the first block to see if it contains "Record Number 1" or some other number due to wrapping around during writes. (and sure, ibs and count can be removed in the above example, I added them to demonstrate useful options in case the input isn't a simple echo command)

Anyway, I'd like to see how the author would handle all those blocking & unblocking, conversion, padding or sync requests available with dd with his head/tail approach. And what about the syntax? OK it was a a joke due to IBM's JCL, but I had to learn infix, postfix and even postfix notation for math (i.e. 5+3, 9! or f(x,y) and integrals and ...) which sometimes too looks like JCL? And have to remember if I need to use -c or -m even for simple tools like wc.


I disagree with "There's an obscure option to GNU dd to get it to display a progress meter as well".

The proposed alternative is to use the program "pv". In my opinion, the program "pv" deserves the epithet "obscure", not dd.

I have been using Linux for decades, but the program "pv" has not been installed on any of the systems that I have used.

On the other hand, dd is a part of coreutils, so it is always available.

I frequently use dd when writing on raw devices precisely for the progress option, because the images written may have sizes from tens of GB to several TB, so the duration of the operation is not negligible, sometimes it may take hours, because on most SSDs the writing speeds drop to quite low values for multi-GB data sizes.

Moreover, since SSDs have replaced HDDs, the size of the write buffer has become much more important than before. There are many SSDs where certain buffer sizes can increase the writing speed a lot, typically when the buffer size matches the size of the erase block, which might be of 128 kB or 64 kB, or of another similarly large size. The right buffer size must usually be determined by experiments, so there is no chance that a standard copying program will choose it by default.


Sure, pv might be obscure, but it's stream-oriented, where dd is file-oriented.

Incidentally, we get another Useless Use Of `Cat something | pv > outfile`, instead of `pv < infile > outfile` (or even `cat < infile > outfile`, let the shell do its job ffs)


One nice use of dd is to append an ssh key to .authorized_keys on a host that doesn't allow shell or sftp access (which is what ssh-copy-id needs):

  cat id_rsa.pub | ssh $host 'dd of=.ssh/authorized_keys oflag=append conv=notrunc'

Well normally you'd just use tee -a here except that's not whitelisted on the particular service you and I are using.

My main use of dd is when redirect doesn't have access. E.g. "tar cf - . | ssh foo 'sudo dd of=/some/path/blah.tar'".

Not the best of examples, but it comes up all the time. Sure, I could use "tee /some/path/blah.tar > /dev/null", but that has a greater risk of me forgetting the redirect, and getting garbage to the terminal.


The best thing about dd is the argument system. I like the key=value format. It feels better to me.

I think dashed arguments, that is, getopt style, just add unnecessary noise. This is usually not so bad. But gets pretty annoying when you have an actual language exposed in the args, I'm looking at you iptables. in iptables the readability would be far improved if they had just left off the dashes.

Then there is the unholy abomination that is --key=value what is that -- prefix doing for anyone?, just drop it and use dd style key=value args and everyone(ok perhaps just me) will thank you.

So my conclusion is, ditch cat and just use dd. The bonus is that dd works in both modes: file to pipe, and pipe to file, downside is that dd can't concatinate files. but who actually does that(joke).


I have a terrible memory for things I use infrequently so personally I value consistency over syntax tweaks.

Eventually you'll probably need both but I think the point of the article is that being closer to the Unix/POSIX way will give you more reusable knowledge.

That's certainly true when you're starting out at least. The more small tools you can combine without thinking too much about it the more usable Unix is.


There is no consistancy in unix tooling, the best you can hope for is a good man page, if you are talking about how they all have dashes in front of the flags... I don't see how that helps.

But you are right about the power of unix being that it is a very expressive system exposed in a fairly simple syntax.

We have fallen far from the tree. But I think the real genius of the original unix was in what it did not do. there was a real desire to keep it a fun simple usable system. and it was and still is, as is proven by it's huge popularity and influence even today 50 years later.


Written as someone who has never contended with byte order. Yes. It is still a thing.

I think it's quite unfortunate that the command line tools seem to have gotten frozen in stone.

People barely use dd for anything but copying disk images -- but it's kind of a clunky tool for that. It's got a weird unusual syntax, it requires knowing weird arcane details like what size sectors a device likes working with, and it's easy to shoot yourself in the foot with it.

You'd think that after decades somebody would have made a handy tool for actually writing disk images as an end user. It could use more standard arguments, do sanity checks for destination device contents and mounted filesystems, and automatically determine the optimal block size for the destination device.

It's almost like if we decided that a chisel is the normal tool to use as a flat screw driver.


> You'd think that after decades somebody would have made a handy tool for actually writing disk images as an end user.

There are lots of tools out there for creating disk images/boot disks etc.


I mean as a standard tool in general usage. Somehow after decades of installing various distros, dd is still the standard thing to use.

Ubuntu's docs suggest using Startup Disk Creator

https://ubuntu.com/tutorials/create-a-usb-stick-on-ubuntu#4-...

Fedora recommends using Fedora Media Writer, and other tools like Unetbootin are very popular.


I haven't used it in a while, but it was very handy for copying fixed-record-length files into linefeed-delimited files, and vice-versa.

Zelda 1 fireball as icon!

> There's an obscure option to GNU dd to get it to display a progress meter as well. But why bother memorizing that?

Why bother memorizing when you can man dd?

Proud member of the cult! I've used dd for all kinds of things:

  - Extract the magic bits from a file
  - Copy the disk MBR
  - Make a disk image with a specific block size and confirm it wrote correctly
  - Create loopback disk images
  - Over-format floppy disks for Linux distros that need every extra byte
  - "Securely" delete files (overwrite exact file size with random junk)
  - Copy data more efficiently by changing block size
Some of dd's useful functions:

  - format fixed-length records from newline-separated input
  - transform lowercase to uppercase and vice-versa
  - create sparse files
  - don't truncate data
  - don't stop processing on errors
  - skip and seek in files, input and output, separately

I agree. My use of dd just scratches the surface of its capabilities, but this article hasn't convinced me in the slightest to use anything else.

I've seen a lot of hate for dd lately and I don't really understand why, except that maybe people are getting hung up on the nonstandard arg format and unaware of the tool's versatility. I'm not convinced that cat or tail are better (or even as good) for the examples listed in TFA, and if I couldn't remember something as easy as "status=progress" there's no way I'm going to memorize those cat and head pipe contraptions.


> Another reason to prefer the cat variant is that it lets you actually string together a normal shell pipeline.

That's just as easy with dd, decompress|dd is especially useful:

xzcat linux.img.xz | dd of=/dev/sdb bs=1M

Throw pv in there if you want to.

> here are two ways to create a 100 MB file containing all zeroes

And here's a better way: truncate -s 100m zero.img


I get that using `cat` is a lot more intuitive for people than `dd` for most common use case for `dd`, but calling it a cult is really weird. Of course it's "almost never necessary" it's just copying bytes there are a ton of ways to do that.

When did it become possible to use cat and pipes on block devices? I don’t think this was always the case, or was it?

> I won't blame you for reaching for dd. But otherwise, try to stick to more standard Unix tools.

dd is one of the most standard Unix tools around. It's been a core part of Unix for 40 years. It has its issues, but "non-standard" isn't one of them.


I guess we’re a loud and vocal cult.

"Who cares?" The author of this article, apparently. It doesn't help that the arguments are dismissive and incorrect.

dd is a straight razor, and if used well it can optimize disk transfer speeds for the specific hardware you're running on (for large data transfers).

Don't believe me? Use DD to create a sizable sample file (let's say 2 gigs or so) from rand and try copying that file to your disks with different block sizes (powers of two). Write a small script, and run all your tests with the time command to compare results. Go make dinner, and by the time you're back you should know the optimal block size you should run your fucking huge disk transfer at.

dd is perfect as it is. It does exactly what you tell it to do.


The real reason you see dd specified for this sort of thing is that most people posting such instructions are just copy/pasting from some older doc, or from a cheat sheet they came up with years ago, or maybe from memory from stuff they learned 25 years ago.

I wouldn’t call that a cult. More of a “this one off command I run once every five years isn’t something I need to learn or understand because any ‘better’ alternative won’t stick with me anyway.”


ddrescue on the other hand is completely worthwhile

I use dd instead of cat because the only time I need dd is to make a file full of zeros in an automated script.

Otherwise I use Etcher, or the Pi flasher utility that has really cool extra features.

No I do not care that Etcher is 80MB. Not my app, I didn't pay for it, I don't have to deal with the code, I'm not going to complain.


These days, my main use of dd is to get a specific amount of data from a file, where both "bs" and "count" are useful (no, "bs" does not only set the buffer, it also sets the chunk size for reads and writes, this is SOMETIMEs useful and/or necessary with tapes).

So, this is an approximation of a command pipeline I run several times per year, when I happen to need a secret of an approximate length:

    dd if=/dev/urandom bs=6 count=1 | base64
Tune the "bs=6", depending on how long you want your (guaranteed typeable) secret to be. Every 3 bytes in the input will give you 4 characters in the output and keeping the input block size a multiple of 3 avoids having the output ending in "=".

It MAY be possible to replace this use of dd with other shell commands. But, since I needed to learn enough of dd to cope with tapes, I use that.


This post makes me wish I could down-vote posts on HN.

Legal | privacy