Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Because a string is just a piece of data, and if your program can take it as input, then it must be handled correctly.

“But writing parsers was sooo boring in college, and who has to do this in real life?”



sort by: page size:

Because here you're not doing string manipulation, but AST manipulation (i.e., the datastructure you get after parsing the language).

imagine thinking the difference between parsing a string and using a pre-parsed datastructure was "trivial for everyone" because you could count off how many classes of character were involved

one major difference is that you don't have to write the parser at all the other way


I think it’s just because everyone knows how to work with strings, it’s so tangible, and you can see what data a string contains right there in the print(thing).

You don’t even have to know what the type is of the thing to work on it’s string representation.


> allowing you to write compile-time string parsers

I’m not entirely sure this is a good thing. But it’s certainly convenient in some instances.


Sorry, but this article starts off with an excellent example of why this is horrible:

   val x: String = "hello"
   String x = "hello"
The first line reads: "value X is of type String and contains hello"

The second line reads: "String x contains hello"

val and : are fluff and add nothing. Arguments about it being tougher to parse would have some merit if this wasn't all figured out almost 50 years ago.


It makes it easier to parse a text file without special cases.

It's an insignificant reason, but that's why it's traditionally recommended.


Because people are taught to think in strings. And programming languages coddle them with tools like concatenation and string formatting. And because we let people think they can do useful things with strings as a result.

But what people actually need are grammars.

The exact same reason why parsing HTML with a regex unleashes Zalgo is why generating HTML with string templates is bad. Because both treat HTML as a string, not a grammatically restricted language.


Almost all of the programs that I have ever read parse strings far too often. Almost all scientific and engineering software can eliminate it except at the UI and database.

You are not forced to pass strings. That's merely a convenience (for Java). You can construct and pass data structures instead.

A lot of people reach for string handling when the actual correct thing to do is intentionally avoid string handling, and only handle strings as opaque encoded UTF-8 bytes, that cannot be reasoned about in terms of human language.

I would even argue that having string handling in a standard library (or language) has the potential to cause a net increase in bugs, because of people thinking they are handling strings when actually they are just screwing around with codepoints. Go's string handling is completely broken, for example. As a result of strings in the language, Go programs tend to be more broken than C programs in terms of string handling.


Different literal types are only necessary when the parser needs to treat the string differently - that's why you need them for raw strings for instance.

When you don't need to do that, a function call is a much better answer. There are many things that could be string literals but aren't because they don't have to be.


I was actually more looking for practical issues. Usually the code that I write doesn't even handle strings a lot. Maybe I'm just using other languages for when I do that or maybe I'm using other approaches where others would use strings or maybe I just subjectively don't find them so bad as others. I'd just like to see exactly what people are complaining about so I could find out why I usually don't.

Because, then, to parse out a number you would first need to get it into a string.

Good point!

It is my opinion, however, that string types are fine, just not perfect. I should have maybe made that clearer.


Looks great. I wonder why the author went for a specific syntax for interned strings, as opposed to doing what Java does (automatically interning String literals).

> also `String` and `OsString`

Because these are different types with different semantics.

Why should there be one string type? Languages have to interface with the real world, where string data could be anything, but programs still need to be written for the common cases... hence, &str and String.


Text (aka strings) exists in virtually all software projects

For me, distinguishing between text as something that is intended to be read by humans and strings as serial sequences of characters that may or may not be human readable but will be processed by one or more computing automata is useful. For example in C, the string "Hello World" is terminated by a null character. The null character is not part of the text the string encodes.

Or to put it another way, I find that treating strings as text as two different layers of abstraction clarifies my intent. Code that manipulates text is built on code that manipulates strings and in between there's parsing that has to occur.


Working with strings as code causes editors to get confused, is hard to compose, and means you lose out on any kind of static analysis.

This is what I was referring to at the end of my post:

> Few work directly on strings. But they do it naturally, without enforcement - so like, a function might take a 'str', but the 'str' passed in was parsed into a wrapping structure already.

ie: Most programs in typed languages already do what you're saying - they parse the data directly into a structure, and therefor they validate some aspects of it naturally, so even when you do see a 'str' in typed code it's very often already gone through some sort of parsing phase.

next

Legal | privacy