If you’re applying arbitrary invariants that aren’t codified in the type system, you may as well just use raw pointers.
Offsets cause bugs when you do things like assume your strings are of a particular length; that’s why they’re a bad choice.
Use a crate that provides a safe abstraction, don’t roll it by hand yourself, and the safe abstraction can be implemented in whatever way you want; but it’s not safe if it doesn’t enforce the invariants.
I was recently bit using a similar technique to the original code for an intrusive type in C. Using manual non-typesafe raw offsets for things can definitely lead to nasty bugs.
Automating the calculation of offsets and assigning pointers definitely eliminates a lot of potential bugs, but I do wonder if this isn't a bigger deficiency in C++. Why force storing an extra pointer per array? I am still not sure why C++ doesn't allow FAMs [1].
There's probably a question of which would be more efficient, storing pointers in the object and calculating the offset from the pointer, or dynamically calculating the offset within the object. Still, it doesn't seem like a problem developers should need to solve.
Yeah for me it's more around the ergonomics (and imprecision) about using indexes.
It's a perfectly workable approach, but passing around offsets feels like I'm breaking the contract a bit. The compiler doesn't know which index goes with which data structure, I'm just asking the compiler to trust me that I'm pairing them correctly.
Also it tends to be more boilerplate than just having a direct pointer stored inside the data structure.
Don't get me wrong, I understand all the problems with pointers, but it the UX is better in a lot of cases.
Offsets are important in a programming language where you are regularly dealing with memory addresses, not nearly so much in one where memory access is almost completely abstracted away.
There is no problem using raw pointers for non owning pointers. Also a lot of safe abstractions can be built on top of raw pointers (and smart pointers are obviously an example).
I don't like offsets because I don't they are that straightforward. First, the offset is useless without the base pointer. So that either needs to be global or you need to store that too (and now you have fat pointers). But the base can change, so you really need to store a reference to it. yuck.
And you still can't delete anything when using offsets. Forgetting about random deletions (which are important) an array works for a stack, but not a fifo, and fifos are at least as common.
The offset addressing modes all start from zero. You could hack around it in some places by storing the pointer to an element before but that's clearly going against how it is intended to be used.
For example if you have an array that you cast between different sized types (e.g. uint64 to uint8) then you have to change the pointer value!
> Others don't as it breaks things like mmap of prebuilt hashtables.
Can you elaborate? An mmap of prebuilt hash tables doesn’t work well in practice of the mmapped area contains pointers regardless of provenance, and an mmap of a hashtable that uses integer offsets doesn’t involve pointers.
The only real issue I see is if the mmap contains objects but isn’t itself laid out like an object in the language in question, and you need to generate a pointer to one of those objects. (So mmapping an array of structs that don’t contain pointers is fine, but mmapping a mess that contains integer offsets referencing various things in the mmap that don’t nicely line up like an array is harder.)
But I imagine that a pointer provenance system could have an operation that takes as input an mmap, an offset and a type and returns a pointer to the object with the type in question at the offset in question. It would check that the type makes sense (no pointers!) and could, if needed for the degree of safety require, also check for invalid aliasing.
You... haven't worked on a large codebase, have you?
The part that baffles me about this entire post is that it's trivial to obtain pointers to member functions legally, without the fragility associated with guessing VTable offsets.
Well, the point is safety for existing code. If you can annotate pointers to match them with their bound you can as easily replace them with a span and avoid needing compiler heroics.
Edit: unless you absolutely need the change to be ABI stable, but even then there are ways around that.
About raw pointers — no you can’t do same as in C++. There’s no malloc/free, and placement new is still unstable. Also they feel foreign to the language, and sometimes I can’t even get them from the standard collections (like when I need 2 mutable pointers to different elements).
About the safety, it’s important to understand there’s a price. In some cases, it causes simple things hard to implement, or cost performance.
For example, there’re algorithms out there that modify items of the same collection at the same time, both single threaded like sorting, and parallel.
Can such algorithm cause a data race? Yes.
Is it good the compiler tells me about that? Sure.
Is it good the compiler prevents me from doing that at all, and forces me to code some workarounds, that will cost me time to implement, will be ugly, and/or will sacrifice runtime performance? Not sure, I like having a choice.
Those change the type annotations for a memory location. Also if you tell me there are no concurrency gotchas for this then I’ll tell you that you have a single threaded interpreter or you’re lying.
We are talking about retaining the type and swapping the pointers from an independent object to a set of offsets into a struct of arrays.
Are you talking about go-style explicit pointers? I don't know if it is worth it to add them with a different semantics and dealing with nulls when you can have value types with an attribute. Pointers as an abstraction also leak implementation detail.
Offsets cause bugs when you do things like assume your strings are of a particular length; that’s why they’re a bad choice.
Use a crate that provides a safe abstraction, don’t roll it by hand yourself, and the safe abstraction can be implemented in whatever way you want; but it’s not safe if it doesn’t enforce the invariants.
reply