Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

There's a common idiom of skipping a file header and handing off processing to some other program, like this:

    cat foo | ( dd bs=$HEADERSIZE skip=1 of=/dev/null; process-foo-contents )


view as:

Isn't this the same as:

    tail -c +$HEADERSIZE <foo

The dd idiom can be used to split a file into parts with known block size, something like

    (dd bs=$SIZE1 count=1 of=file1; dd bs=$SIZE2 ...

Cool, although this doesn't really help with my most common-case, skipping the first line.

    (head -1 > /dev/null; cat -) < file

head is line-wise. dd is byte-wise.

Yeah, I was just saying that generally, when I need to strip off a header, the header is the first line of the file

Unintuitively, tail is the utility you want:

  $ tail -n +2 file
From tail(1):

  -n, --lines=K
    output the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth

Yeah, I know about that, I just prefer not to use that option because `head -1 > /dev/null` is clearer

This is undefined behavior: while `head -1` will only output a single line, it may read more.

It happens to work on GNU head when stdin is seekable file, because GNU head specifically rewinds the stream before exiting:

    $ (strace  -e read,write,lseek head -1 > /dev/null; cat -) < file
    ...
    read(0, "hello\nworld\n", 8192)         = 12
    lseek(0, -6, SEEK_CUR)                  = 6    # <-- here
    write(1, "hello\n", 6)                  = 6
    +++ exited with 0 +++
If not for that explicit `lseek`, `head -1` would have skipped the entire 8k buffer.

As far as I know, this is exclusive to GNU cat. Neither Busybox nor OSX cat will do this, and will therefore throw away an entire buffer instead of just the first line. You can try it out:

(busybox head -1 > /dev/null; cat -) < file


Interesting. Is this true of `tail -n +2` as well? (On mobile, can't test at the moment).

That always reads to eof, so it can't be used in the same way.

Tail employs a large read buffer as well, but it does not matter because you wouldn't use it in the same manner.

Tail is the right tool for the job here. But if you wish to stick with your idiom, read will reliably consume a single line of input, regardless of how it is implemented:

  (read -r; cat) < file

If anyone is wondering why `read` can reliably read a single line while head can't, it's because it reads byte by byte.

This is just as inefficient as it sounds, but it doesn't matter much in practice since you rarely read a lot with it.


Legal | privacy