Hacker Read

koala_man · 2017-01-09 03:03:31+00:00

If anyone is wondering why `read` can reliably read a single line while head can't, it's because it reads byte by byte.

This is just as inefficient as it sounds, but it doesn't matter much in practice since you rarely read a lot with it.

mistercow | karma 10714 | avg karma 3.06 · | 2013-09-28 00:21:28+00:00

>It also happens to nearly always be the wrong way to read a file

It's the wrong way in many cases, but it's the right way in a very large number of cases, if not in the majority of cases.

Often you need to read in a relatively small file, then do something trivial with it, or toss it through a couple of library-provided string processing functions. Or maybe the files are a bit bigger, and you're writing a script to automate some grunt work. You expect to run this script once. Or, more generally, maybe you're writing some code that gets called once a month and takes less than a second to complete.

In any of those situations, it would be silly to do anything other than slurping. String manipulation is easy to reason about. Stream processing is not.

Also note that most OSes cache files in memory, so if you are reading the same file often, the slowdown from reading the data into memory is drastically reduced.

reply

planetis | karma 495 | avg karma 1.88 · | 2023-01-29 19:11:21

That issue is about readLine specifically right? Not reading a file in general. The only time I get to ever use readLine is to solve adventofcode problems.

paulgb | karma 25596 | avg karma 5.84 · | 2013-09-27 23:47:17+00:00

I agree that it's good to have an easy way to read a whole file in, but when I think about it I can't think of any case where I've had to write code to read a file that I didn't want to process line-by-line, in which case the non-slurping method is actually less code.

ficklepickle | karma 1600 | avg karma 1.9 · | 2022-01-03 15:05:55

They didn't specify needing to read the file line-by-line. You could read the whole file at once. There might not even be any new line characters in the file. You invented a requirement.

jrockway | karma 72069 | avg karma 3.74 · | 2010-06-08 17:42:02

I just use "read" which is only one line of code, and has about as many features as linenoise.

readline is "bloated" because it actually does something.

reply

pyre | karma 21143 | avg karma 2.37 · | 2013-09-28 03:49:38+00:00

If you're processing something one line at a time, and outputting something based on each line, then you don't need to read-and-process the entire file before printing everything out.

Other than that, it's more memory efficient.

reply

ucs | karma 25 | avg karma 1.19 · | 2017-01-09 02:34:00

Tail employs a large read buffer as well, but it does not matter because you wouldn't use it in the same manner.

Tail is the right tool for the job here. But if you wish to stick with your idiom, read will reliably consume a single line of input, regardless of how it is implemented:

  (read -r; cat) < file

boomlinde | karma 3196 | avg karma 1.8 · | 2015-12-11 17:39:14+00:00

readlines() just to pick the first seems weird if you expect a single line. I'd read the whole thing in one go with read(). Then I start wondering if it wouldn't make more sense to keep the stream open and just iterate over new messages line by line and avoid the overhead of loading an executable, importing libraries etc. on every request

m1keil | karma 393 | avg karma 2.1 · | 2023-06-21 01:10:07

If I understand correctly, this works more like a pipe and not a file in the sense that you can only read what you wrote once, right? But a nice trick nonetheless.

jonahx | karma 6127 | avg karma 4.32 · | 2023-07-30 01:28:14

Here's some example C using one:

    bytesRead = read(fd, buf, BUF_SIZE - 1)

You read the from the file descriptor into the buffer. Just like in Go you read from a file into a byte array. Neither the file descriptor nor the file is conceptualized as a tool for reading. The file descriptor is merely an abstraction of the file, which extends the concept of "file" to include pipes, sockets, and other io.

coldtea | karma 86593 | avg karma 2.38 · | 2014-11-11 03:09:46

>It reads data from reader and writes to writer... simple. Now what makes this little function so darn useful is it takes anything fulfilling its interfaces (io.Writer and io.Reader). The first way you will probably use it will be to copy between some stream and a file without having to eat up all the memory to store the buffer (not using ioutil.ReadAll for example)... but then you realize you can use a gzip compressor on the writer side, or a network socket, or your own code... and io.Copy works with anything that fulfills its interface

And how is that any different than any language with interfaces? (Besides the implicit thing?).

reply

collinmanderson | karma 2158 | avg karma 2.13 · | 2023-11-30 18:40:03

Ahh ok so to be correct you have to keep reading until you get an empty read?

Maybe I don’t need to query the file size at all?

reply

inglor | karma 6182 | avg karma 5.42 · | 2015-04-24 12:00:35

A lot of things are supposed to work certain ways but in fact I never really used `Read` for anything more than reading simple user config in scripts. Usually my data is serialized/deserialized from more common formats such as JSON.

xondono | karma 1795 | avg karma 2.06 · | 2019-11-12 08:41:23+00:00

Not an expert, but after reading around OP is saying you only need a constant sized buffer (that could be as little as a character deep) to read the whole file looking for line breaks. This roughly takes O(1) space (A lot of commenters are confusing time vs space requirements) but there's a little nitpick to be found, in extremely big files the line counter itself would be O(log(n)) and would dominate.

EE84M3i | karma 1886 | avg karma 3.03 · | 2020-10-26 00:40:52

Wait, I thought that lazy io (e.g. getting a lazy string back from "reading" a file, which triggers subsequent reads when you access it) was widely considered bad and a mistake.

stouset | karma 11223 | avg karma 3.87 · | 2018-03-19 23:52:21

How exactly can one mess up reading bytes from `/dev/urandom`? Serious question.

Open the file. Read from it. If no failures on open or read, you have random bytes. In essence, there is already a library for this: `open` and `read`, which seems to be the same API surface area as this library.

reply

mplewis9z | karma 209 | avg karma 4.18 · | 2023-06-21 01:22:19

> you can only read what you wrote once

In the way I’ve written it, yes. But you can open as many file descriptors as you need before you `rm` the file, and they each will keep their own position in the file. You can’t do a dynamic number of reads through the file (at least without dumping the contents somewhere else first), but a fixed number greater than one is absolutely possible.

reply

gpvos | karma 7664 | avg karma 2.34 · | 2024-03-02 11:47:59

> looping over `read` is essentially never the right thing to do

Why? I do it quite often, though admittedly usually in one-time scripts.

reply

Chris2048 | karma 2919 | avg karma 0.51 · | 2021-05-17 08:41:20+00:00

> so you can read them with any tool

what tool can't you read them with? Why is the fs the fundamental unifying abstraction, as opposed to pipes?

> doesn't allow you to do all the things you can do with a real file

like what?

reply