Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

If anyone is wondering why `read` can reliably read a single line while head can't, it's because it reads byte by byte.

This is just as inefficient as it sounds, but it doesn't matter much in practice since you rarely read a lot with it.



sort by: page size:

>It also happens to nearly always be the wrong way to read a file

It's the wrong way in many cases, but it's the right way in a very large number of cases, if not in the majority of cases.

Often you need to read in a relatively small file, then do something trivial with it, or toss it through a couple of library-provided string processing functions. Or maybe the files are a bit bigger, and you're writing a script to automate some grunt work. You expect to run this script once. Or, more generally, maybe you're writing some code that gets called once a month and takes less than a second to complete.

In any of those situations, it would be silly to do anything other than slurping. String manipulation is easy to reason about. Stream processing is not.

Also note that most OSes cache files in memory, so if you are reading the same file often, the slowdown from reading the data into memory is drastically reduced.


That issue is about readLine specifically right? Not reading a file in general. The only time I get to ever use readLine is to solve adventofcode problems.

I agree that it's good to have an easy way to read a whole file in, but when I think about it I can't think of any case where I've had to write code to read a file that I didn't want to process line-by-line, in which case the non-slurping method is actually less code.

They didn't specify needing to read the file line-by-line. You could read the whole file at once. There might not even be any new line characters in the file. You invented a requirement.

I just use "read" which is only one line of code, and has about as many features as linenoise.

readline is "bloated" because it actually does something.


If you're processing something one line at a time, and outputting something based on each line, then you don't need to read-and-process the entire file before printing everything out.

Other than that, it's more memory efficient.


Tail employs a large read buffer as well, but it does not matter because you wouldn't use it in the same manner.

Tail is the right tool for the job here. But if you wish to stick with your idiom, read will reliably consume a single line of input, regardless of how it is implemented:

  (read -r; cat) < file

readlines() just to pick the first seems weird if you expect a single line. I'd read the whole thing in one go with read(). Then I start wondering if it wouldn't make more sense to keep the stream open and just iterate over new messages line by line and avoid the overhead of loading an executable, importing libraries etc. on every request

If I understand correctly, this works more like a pipe and not a file in the sense that you can only read what you wrote once, right? But a nice trick nonetheless.

Here's some example C using one:

    bytesRead = read(fd, buf, BUF_SIZE - 1)
You read the from the file descriptor into the buffer. Just like in Go you read from a file into a byte array. Neither the file descriptor nor the file is conceptualized as a tool for reading. The file descriptor is merely an abstraction of the file, which extends the concept of "file" to include pipes, sockets, and other io.

>It reads data from reader and writes to writer... simple. Now what makes this little function so darn useful is it takes anything fulfilling its interfaces (io.Writer and io.Reader). The first way you will probably use it will be to copy between some stream and a file without having to eat up all the memory to store the buffer (not using ioutil.ReadAll for example)... but then you realize you can use a gzip compressor on the writer side, or a network socket, or your own code... and io.Copy works with anything that fulfills its interface

And how is that any different than any language with interfaces? (Besides the implicit thing?).


Ahh ok so to be correct you have to keep reading until you get an empty read?

Maybe I don’t need to query the file size at all?


A lot of things are supposed to work certain ways but in fact I never really used `Read` for anything more than reading simple user config in scripts. Usually my data is serialized/deserialized from more common formats such as JSON.

Not an expert, but after reading around OP is saying you only need a constant sized buffer (that could be as little as a character deep) to read the whole file looking for line breaks. This roughly takes O(1) space (A lot of commenters are confusing time vs space requirements) but there's a little nitpick to be found, in extremely big files the line counter itself would be O(log(n)) and would dominate.

Wait, I thought that lazy io (e.g. getting a lazy string back from "reading" a file, which triggers subsequent reads when you access it) was widely considered bad and a mistake.

How exactly can one mess up reading bytes from `/dev/urandom`? Serious question.

Open the file. Read from it. If no failures on open or read, you have random bytes. In essence, there is already a library for this: `open` and `read`, which seems to be the same API surface area as this library.


> you can only read what you wrote once

In the way I’ve written it, yes. But you can open as many file descriptors as you need before you `rm` the file, and they each will keep their own position in the file. You can’t do a dynamic number of reads through the file (at least without dumping the contents somewhere else first), but a fixed number greater than one is absolutely possible.


> looping over `read` is essentially never the right thing to do

Why? I do it quite often, though admittedly usually in one-time scripts.


> so you can read them with any tool

what tool can't you read them with? Why is the fs the fundamental unifying abstraction, as opposed to pipes?

> doesn't allow you to do all the things you can do with a real file

like what?

next

Legal | privacy