> the C std lib is the weakest part of the C language and it should only be used as a fallback.
I've been musing for a while now: what would it look like if we were to discard the C library and design a new one, leaving the language itself intact?
Amending the syntax is fun but rapidly becomes a slippery slope; soon enough you find yourself designing a new successor language, as has been done many times before. Simply scrapping the mostly-unhelpful C stdlib and inventing new, modern abstractions for allocation, IO, text, threading, etc seems like a more tractable problem.
It has the same fundamental problem, though: you have to rewrite most existing code, which hinders adoption. In this case, it might actually hinder it more than also improving the language itself, since people would be more willing to take that leap if there are more benefits to be had from it.
Those are just examples. The tricky part is figuring out the different ownership use cases you want to solve. Because C gives you so much freedom and very little in the standard library, you end up with a lot of variations. You might use reference-counted strings, owned buffers, or string slices, etc. You might want certain types to be distinguished at compile-time and other types to be distinguished at run-time.
The history of changes to this file is interesting as well. This is a relatively nice general-purpose string type—you can easily append to it or truncate it.
It sounds like you’re rephrasing part of my comment back to me, or maybe I’m misinterpreting what you’re saying.
If you’re not convinced of the practicality, it sounds like you are simply not convinced of the practicality of doing string processing in C at all, which is a fair view point. String processing in C is somewhat a minefield. Libraries like Git’s strbuf are very effective relative to other solutions in C, but lack safety relative to other languages.
No, I simply am using a different approach, still in C, where strings are simple char*, null-terminated, nothing hidden with magic fields above the base address of the string.
The trick is to pass an allocator (or container) to string handling functions.
If/when I want to get rid of all the garbage I reset the container/allocator.
The old MacOS (pre-X) did just that. Strings were all "Pascal strings", ie. with the first byte containing the length of the actual string.
Building blocks for memory were also very different from stdlib, notably the use of Handles, which were pointers of pointers, so that the OS could move a block of data around to defragment the heap behind your back without breaking the memory addressing.
Pascal strings are also kind of bad though. All sub-string operations need allocation, or have to be defined with intermediate results which aren't "really" strings, so in that sense it's not an improvement on Zero-terminated strings. Equality tests are cheaper which is nice, since strings of different lengths compare unequal immediately, but most things aren't really improved.
C++ string_view is closer to the Right Thing™ - a slice, but C++ doesn't (yet) define anywhere what the encoding is, so... that's not what it could be. Rust's str is a slice and it's defined as UTF-8 encoded.
D's strings were defined to be UTF-8 back in 2000. wstring is UTF-16, and dstring is UTF-32.
Back then it wasn't clear which encoding method would turn out to be dominant, so we did all three. (Java was built on UTF-16.)
As it eventually became clear, UTF-8 is da winnah, and the other formats are sideshows. Windows, which uses UTF-16, is handled by converting UTF-8 to -16 just before calling a Windows function, and converting anything coming back to UTF-8.
D doesn't distinguish between a string and a string view.
Sure, but that's not the string_view's problem, you can't just make string_views, the string you want to borrow a view into needs to exist first.
Imagine you go to a library and insist on borrowing "My Cousin Rachel", but they don't have it. "Oh I don't care whether you have the book, I just want to borrow it" is clearly nonsense. If they don't have it, you can't borrow it.
> D doesn't distinguish between a string and a string view.
In C++ std::string owns the buffer and std::string_view borrows it. If there is no difference between the two in D, then how is this difference bridged?
You can use automatic memory management and not worry about it. Or you can use D's prototype ownership/borrowing system. Or you can encapsulate them in something that manages the memory. Or you can do ownership/borrowing by convention (it's not hard to do).
I guess I should rephrase. Let's say I have a string, which owns its buffer. What happens in D if I take a substring of it? Does a copy of that section occur to form a new string?
A lot of people don't know about this but Microsoft is taking steps to move everything over to utf-8.
They added a setting in Windows 10 to switch the code page over to utf-8 and then in Windows 11 they made it on by default. Individual applications can turn it on for themselves so they don't need to rely on the system setting being checked.
With that you can, in theory, just use the -A variants of the winapi with utf-8 strings. I haven't tried it out yet as we still support prior Windows releases but it's nice that Microsoft has found a way out from the utf-16 mess.
The A-variants had problems years ago, which is why D abandoned them in favor of the W versions.
I don't mind seeing UTF-16 fade away. We've been considering scaling back the D support for UTF-16/32 in the runtime library, in favor of just using converters as necessary. We recommend using UTF-8 as much as practical.
Which is why Free Pascal strings are so awesome. I've personally stuffed a billion bytes on one, without issues. They are automatic reference counted, and as close to magic as you can get. You can return one from a function without issue.
However, Free Pascal has the worst documentation of any major project I've ever encountered (The exact opposite of Turbo Pascal), so I can't link to a good reference. Their Wiki is a black hole of nuance and sucks all useful stuff off the internet.
It might fix that particular issue, but you still have the same problem that NUL terminated strings have: it's not possible to cheaply create views/slices of a string using the same type.
I remember that era well! During the first few years I used C, I never touched its standard library at all, using the Mac Toolbox instead. This was a common practice, which later carried over into C++.
How so? First definition I find of glib is "(of words or the person speaking them) fluent and voluble but insincere and shallow", which is mostly what I meant about that answer. There was some sincerity in my answer, but certainly somewhere in the border space of irony and sarcasm, which many people do take as insincerity.
The string handling functions are part of the story, but the null-terminated char * is produced when the compiler reaches a string literal, and writing code without being allowed to just use string literals when it's convenient tends to feel like coding with oven mitts on.
Isn't that much more of a mouthful, and as long as 'my_function' knows to free it, then you're A-OK! The only trouble is '$()' isn't legal in standard C, so a real solution would have to be something like 'str()'.
I'm in the WG14 and my opinion is that there isn't one good way to do strings it all depends on what you value (performance/ memory use) and the usage pattern. C in general only deal with data types, north their semantic meaning. (IOW We say what a float is not what it is used for). The two main deviations from that are text and time and both of them are causing us a lot of issues. My opinion is that writing your own text code is the best solution and the most "C" solution. Te one proposal i have heard that i like is for C to get versions of functions that use strings that take an array and a length, so as to not force the convention of null termination in order to use things like fopen.
Its been 50 years so pretty much everything has been considered. In my opinion the mistake was not having arrays decay in to pointers but rather arrays should be pointers in the first place. An array should be seen as a number of values where with a pointer pointing at the first one. I think adding a third version of the same functionality would just complicate things further. (&p[42] is a "slice" of an array) Another thing I do not like about slices that store lengths, is that they hide memory layout from the user and that is not a very C thing to do.
You are right, sizeof is the other big difference. I think these differences are small enough that it was a mistake separate the two. The similarities / differences do make them confusing.
An array of pointers to arrays? Basically, a `T**` C#'s "jagged" arrays are like this, and to get a "true" 2D array, you use different syntax (a comma in the indexer):
int[][] jagged; // an array of `int[]` (i.e. each element is a pointer to a `int[]`)
int[,] multidimensional; // a "true" 2D array laid out in memory sequentially
// allocate the jagged array; each `int[]` will be null until allocated separately
jagged = new int[][10];
Debug.Assert(jagged.All(elem => elem == null));
for (int i = 0; i < 10; i++)
jagged[i] = new double[10]; // allocate the internal arrays
Debug.Assert(jagged[i][j] == 0);
// allocate the multidimensional array; each `int` will be `default` which is 0
// element [i,j] will be at offset `10*i + j`
multiDimensional = new double[10, 10];
Debug.Assert(multiDimensional[i, j] == 0);
Yes, this is people with pre-C99 compilers that do not support variably modified types sometimes do. It is horrible (although there are some use cases).
I plan to bring such a proposal forward for the next version. Note that C already has everything to do this without much overhead, e.g. in C23 you can write:
int N = 10;
char buf[N] = { };
auto x = &buf;
and 'x' has a slice type that automatically remebers the size. This works today with GCC / clang (with extensions or C2X language mode: https://godbolt.org/z/cMbM57r46 ).
We simply can not name it without referring to N and we can also not use it in structs (ouch).
How is this not a quality of implementation issue? Any implementation is free to track all sizes as much as they want with the current standard.
Either a implementation is forced to issue an error at run time if there is an out of bounds read/write and in that case its a very different language than C, or its feature as-if lets any implementation ignore.
Tracking sizes for purposes of bounds checking is QoI and I think this is perfectly fine. But here we can also recover the size with sizeof, so it is also required for compliance:
And I agree that this is a misuse of auto. I only used it here to show that the type we miss already exists inside the C compiler, we simply can name it only by constructing it again:
I've been musing for a while now: what would it look like if we were to discard the C library and design a new one, leaving the language itself intact?
reply