Besides the fact that you have to walk the string just to find the length of it...
Null-terminated strings try to save space by encoding the string length by convention. This can fail due to off-by-one errors, mistakenly allocating a fixed buffer, strings that have an unexpected null in them, and more.
Those can lead to all kinds of nasty problems like buffer overflows, which allow someone who can craft an input to write arbitrary stuff into memory.
God only knows how many security vulnerabilities and performance problems could be traced back to null terminated strings.
Yeah, there might be a few cases were null terminated strings work okay but mostly they are just a recipe for buffer overflows. Seriously, how many years is this going to take to figure out?
That’s a fair point, but of the over twenty programming languages I’ve used in my career, only C uses null terminated strings. All the others store the length. There’s good reasons for that. I think C strings are objectively bad and error prone.
That's kind of what I mean. The fact that null-terminated strings can get arbitrarily long doesn't seem like a big advantage. I mean, if you're working with really long strings, you probably want to use something more sophisticated than a character array anyway.
The reality is that null-terminated strings are dramatically more expensive than strings with a length counter in every regard other than memory usage, and the memory usage overhead from storing a length value is utterly miniscule compared to the actual size of the string. Even if you ignore all the secondary costs that result from the decision to use null-terminated strings, they're just poor engineering. There are far better ways to save a few bytes.
(By secondary costs I mean things like the myriad bugs caused by null-terminated strings, the severe performance penalties involved in copying and manipulating them, the unfortunate implications they have for file formats and network protocols, etc.)
If the lengths are wrong. You said that the function shouldn't assume that strings are null-terminated correctly -- why should it assume that the lengths are correct?
This is why I hate them too. You can use a custom length + pointer type for representing strings in your own code, but interfacing with other libraries and the OS almost always requires having a null-terminated string. It forces you to make copies just to tack on the null terminator.
Actually, null-terminated strings made a lot of sense when the string is short. On my 8-bit days, I used both approaches (because, sometimes, strings have to contain a NUL in them)
I should have specified a little more, perhaps. Don't think of it as a null-terminated + length-prefixed string. It's effectively a purely length-prefixed string. There just happens to always be a null one byte after the end of the string.
The length prefix is the only thing you use, ordinarily. The only time the null comes into play is if you've already had a bug.
No matter what string manipulation method you use, you have to produce linear, null-terminated strings for passing into all sorts of existing API's, standard and not.
What are the pros of null-terminated strings? You have to recalculate something you already know at each string operation, potentially failing. It is slow and unsafe.
Languages where a string type is possible to have usually make use of length and non-null-terminated, while I think C++ does length and C-string with longer texts but for short string can “hack” the text itself into the pointer.
> Correct string handling means that you always know how long your strings are
Well, I couldn't think of a stronger argument against NULL terminated strings than this. After all, NULL terminated strings make no guarantee about having a finite length. Nothing prevents you from building a memory mapped string that is being generated on demand that never ends.
Earlier today I wrote a post about "security problems that C causes" and neglected to mention the use of null-terminated strings instead of a proper data structure that encapsulates length along with the string.
Well, this is what happens when you assume some sort of special data is valid, when it isn't actually. (\0 can appear in a string, it's a perfectly valid character, so it's not safe to use it to terminate the string. But people do anyway.)
reply