Or a Vec<T> for storage and a Vec<usize> for tombstoning. Lots of ways to solve this, and my experience is that you can beat a linked list approach with the 'Vec plus Vec' approach for a lot of data sizes/ operations.
Or a custom Vec that has a lower maximum bound, at which point you can start doing things like integer encoding/ pointer swizzling.
When using C, if you want something from a higher-level language, usually you'd just do what that language's runtime does internally. So for a vector of heterogeneous objects, you could do a linked-list of pointers to the data objects. Something like:
Is it more work? Yes. But at this point I would stop and consider if I really need a heterogeneous collection. Do we really need to store a list of objects with arbitrary sizes? What if we have 3 types of objects, in 3 arrays, and we store index into those arrays?
struct Foo { unsigned byte x; };
struct Bar { unsigned word x; };
struct Baz { unsigned long x; };
struct Foo foos[256];
struct Bar bars[256];
struct Baz bazzes[256];
// objTypeId: 0 = foo, 1 = bar, 2 = baz
struct VecItm { struct VecItm next; unsigned int idx; char objTypeId; };
Or what if we can do something even better? What if Foo, Bar, and Baz all have a max size that isn't too wildly different? Can we store them all in an array directly?
It's really not that bad. It can result in spaghetti-code where your macros are using other macros and they're spread all over a header file. But if you use them surgically, only when needed, they don't cause much trouble.
> I could just as well write a sed script
The benefit of C macros over sed is if you use some language-aware tooling like an IDE (such as CLion), it will syntax-check and type-check your macros.
Hmm… that would work, but at the cost of requiring the data to be immutable or use interior mutability. It also removes the size advantage of storing indices over pointers, unless you only make SmartRefs temporarily rather than storing them in your data structures.
Thanks, that's interesting. My own main optimization problem is always reducing the memory usage of large in memory data sets. I found that languages like Java and Python, which don't support structured value types and use references/pointers everywhere, are basically unusable for my my purposes.
Now, I wonder whether I could use Lisp for what I do. One basic necessity would be to have a way to define a structure like this
struct s {
int32_t x;
int32_t y;
};
and then create an array of that type as struct s a[size], which uses exactly size * sizeof(s) bytes in memory. A further requirement would be to have a library of data structures (balanced trees, lists, hash tables) that support storing such structs directly without using pointers everywhere.
Do you have any idea whether that could work with SBCL?
As I'm sure you're aware, this gets lost in bike shed land every time it comes up but Rust could implement `Index<iN|uN>` pretty easily. Unlike C you don't need an implicit cast to do the right thing.
Personally, I have datastructure that uses non `usize` indexes I usually wrap my vector/array in in a custom type that implements index on whatever my common index types are.
I want to store data as binary, and generally I would just lay everything in arrays, write the vector size as an int, the type id (arbitrary value) as a char, write data, done. I just have 2 function to do that, and it's enough.
If you have fat data, binary is a wise solution. Generally flattening data in array is not so complicated, and you gain speed. SQLite is also a good solution, although I'm not sure it's good to store things like arrays of vec2, pictures...
I tried protocol buffers once, I was really horrified by the size of the headers it required.
Definitely indices are a great option, but then you need a base pointer and a way to allocate off of it. That can add significant complexity. So it is all a tradeoff.
A concurrent lock-free vector would be nice-to-have - i just found one paper you might already know 'Lock-free Dynamically Resizeable Arrays' 2006 (Dechev, Pirkelbauer, Stroustrup) ? Its a good read. I like your approach, too. ATM i'm trying smth. with a short singly-linked-list and chunks that can vary in size - grow & shrink and where adjacent nodes can be merged. Sizes range from 1..65Kb and a simple 'compression' of unused slots. It might get interesting when the use-case is less about adding/removing single members, but rather adding/removing parts-of other vectors/ranges-of-values into one vector.
And that is the exact joy of using a language like C or Go; you need to sort your things, just add in two pointers and make it a list you sort on. Eight or now sixteen extra bytes and you have a good tuned data structure. I don’t want to use a bunch of generic structures, I want to use the ones that solve my particular use case best.
Coming from C++, the main thing I am missing in C is generic data structures. Having to resort to macros to implement generic vectors (https://github.com/eduard-permyakov/permafrost-engine/blob/m...) I find cumbersome to say the least. It is also hard to beat the performance of the STL data structures when implementing something seemingly straightforward like a vector type.
Rust can totally help you do that properly. You can wrap the indices in a struct parameterized by a lifetime and regain all the same tools you would have with language pointers.
That snippet as a whole could be optimized. Individual components would be trickier though. To be clear, you aren't suggesting something as seemingly straightforward as shoving lists of integers into contiguous memory, right? With native big integers, integer subclasses, and whatnot I'd imagine that'd get thorny in a hurry.
Yeah that's pretty similar. I keep a list of pointers to blocks and they're all the same size, that way you can easily address elements via a linear index. I tried doing the doubling size thing like std::vector, but the addressing became more complex and I didn't find a need for it, but I might consider it in the future. But yeah, it seems like it's almost the same kind of structure. Thanks for the pointer!
How often do you really need those interesting structures(and the memory fragmentation that comes with them)? I've seen countless times where a developer reached for std::hash_map/linked_list when there will never be more than 10 values in their dataset. In that case an array would be at least as fast and much easier on your data layout.
Also if you're trying to implement lockfree data structures then safe/unsafe pointer access are going to be the least of your worries :).
Deques are great. I think there's two ways I'd consider designing this library differently:
1) The resizing seems worse to me than a linked-list of fixed-size ring buffers which use a sync.Pool.
2) (more out there) some of the time when I've implemented this sort of thing, it's been to allocate memory for unmarshaling. It might be nice to have a mechanism to store values and allocate pointers.
Either encode with nil or if space is not a problem, use an aditional field that stores a bitmap.
Which allows to use structures with pointers without making nil semantics fuzzy.
Can even represent that bitmap as a vector and create a presence operator that is essentially a kind of intersection operation.
Thanks! What kind of performance do you get with some of those more intricate pointer structures? My own research right now is focused on how to build some common abstract data types with less pointer indirection, so as to be more easily vectorizable.
Or a custom Vec that has a lower maximum bound, at which point you can start doing things like integer encoding/ pointer swizzling.
reply