Hacker Read

staticassertion · 2021-05-16 02:04:31

Or a Vec<T> for storage and a Vec<usize> for tombstoning. Lots of ways to solve this, and my experience is that you can beat a linked list approach with the 'Vec plus Vec' approach for a lot of data sizes/ operations.

Or a custom Vec that has a lower maximum bound, at which point you can start doing things like integer encoding/ pointer swizzling.

reply

pornel | karma 20915 | avg karma 5.07 · | 2023-11-03 20:20:36

I think another solution could be to use an arena/memory pool to replace indices with references.

But indices are still safer than raw pointers - vec access does bounds checks. You can make indices typed if you’re worried about mixing them up.

reply

phendrenad2 | karma 6687 | avg karma 1.55 · | 2022-04-18 18:37:07

> vector-like data type for different types

When using C, if you want something from a higher-level language, usually you'd just do what that language's runtime does internally. So for a vector of heterogeneous objects, you could do a linked-list of pointers to the data objects. Something like:

    struct VecItm { struct VecItm next; void *data; size_t data_len; char objTypeId; };

Is it more work? Yes. But at this point I would stop and consider if I really need a heterogeneous collection. Do we really need to store a list of objects with arbitrary sizes? What if we have 3 types of objects, in 3 arrays, and we store index into those arrays?

    struct Foo { unsigned byte x; };
    struct Bar { unsigned word x; };
    struct Baz { unsigned long x; };

    struct Foo foos[256];
    struct Bar bars[256];
    struct Baz bazzes[256];

    // objTypeId: 0 = foo, 1 = bar, 2 = baz
    struct VecItm { struct VecItm next; unsigned int idx; char objTypeId; };

Or what if we can do something even better? What if Foo, Bar, and Baz all have a max size that isn't too wildly different? Can we store them all in an array directly?

    // objTypeId: 0 = foo (x contains 8 bits), 1 = bar (x contains 16 bits), 2 = baz (x contains 32 bits)
    struct VecItm { unsigned long x; char objTypeId; };
    struct VecItm vecItms[256];

> Its “macro” system is just a disgusting hack

It's really not that bad. It can result in spaghetti-code where your macros are using other macros and they're spread all over a header file. But if you use them surgically, only when needed, they don't cause much trouble.

> I could just as well write a sed script

The benefit of C macros over sed is if you use some language-aware tooling like an IDE (such as CLion), it will syntax-check and type-check your macros.

reply

comex | karma 19146 | avg karma 4.18 · | 2018-10-11 05:46:02

Hmm… that would work, but at the cost of requiring the data to be immutable or use interior mutability. It also removes the size advantage of storing indices over pointers, unless you only make SmartRefs temporarily rather than storing them in your data structures.

fauigerzigerk | karma 15398 | avg karma 2.9 · | 2010-01-28 14:17:46+00:00

Thanks, that's interesting. My own main optimization problem is always reducing the memory usage of large in memory data sets. I found that languages like Java and Python, which don't support structured value types and use references/pointers everywhere, are basically unusable for my my purposes.

Now, I wonder whether I could use Lisp for what I do. One basic necessity would be to have a way to define a structure like this

  struct s {
    int32_t x;
    int32_t y;  
  };

and then create an array of that type as struct s a[size], which uses exactly size * sizeof(s) bytes in memory. A further requirement would be to have a library of data structures (balanced trees, lists, hash tables) that support storing such structs directly without using pointers everywhere.

Do you have any idea whether that could work with SBCL?

reply

bbatha | karma 1205 | avg karma 2.96 · | 2021-05-11 14:57:52+00:00

As I'm sure you're aware, this gets lost in bike shed land every time it comes up but Rust could implement `Index<iN|uN>` pretty easily. Unlike C you don't need an implicit cast to do the right thing.

Personally, I have datastructure that uses non `usize` indexes I usually wrap my vector/array in in a custom type that implements index on whatever my common index types are.

reply

jokoon | karma 3945 | avg karma 1.05 · | 2019-11-17 22:40:50+00:00

I want to store data as binary, and generally I would just lay everything in arrays, write the vector size as an int, the type id (arbitrary value) as a char, write data, done. I just have 2 function to do that, and it's enough.

If you have fat data, binary is a wise solution. Generally flattening data in array is not so complicated, and you gain speed. SQLite is also a good solution, although I'm not sure it's good to store things like arrays of vec2, pictures...

I tried protocol buffers once, I was really horrified by the size of the headers it required.

reply

gpderetta | karma 12081 | avg karma 1.83 · | 2024-05-08 12:51:18

Definitely indices are a great option, but then you need a base pointer and a way to allocate off of it. That can add significant complexity. So it is all a tradeoff.

Bosinski | karma 1 | avg karma 0.07 · | 2023-07-30 17:55:06

A concurrent lock-free vector would be nice-to-have - i just found one paper you might already know 'Lock-free Dynamically Resizeable Arrays' 2006 (Dechev, Pirkelbauer, Stroustrup) ? Its a good read. I like your approach, too. ATM i'm trying smth. with a short singly-linked-list and chunks that can vary in size - grow & shrink and where adjacent nodes can be merged. Sizes range from 1..65Kb and a simple 'compression' of unused slots. It might get interesting when the use-case is less about adding/removing single members, but rather adding/removing parts-of other vectors/ranges-of-values into one vector.

greets, andreas

reply

lanstin | karma 2936 | avg karma 1.73 · | 2022-03-15 21:32:40

And that is the exact joy of using a language like C or Go; you need to sort your things, just add in two pointers and make it a list you sort on. Eight or now sixteen extra bytes and you have a good tuned data structure. I don’t want to use a bunch of generic structures, I want to use the ones that solve my particular use case best.

blux | karma 335 | avg karma 2.52 · | 2021-05-12 16:14:07

Coming from C++, the main thing I am missing in C is generic data structures. Having to resort to macros to implement generic vectors (https://github.com/eduard-permyakov/permafrost-engine/blob/m...) I find cumbersome to say the least. It is also hard to beat the performance of the STL data structures when implementing something seemingly straightforward like a vector type.

Rusky | karma 4294 | avg karma 3.38 · | 2018-01-25 18:46:38

Rust can totally help you do that properly. You can wrap the indices in a struct parameterized by a lifetime and regain all the same tools you would have with language pointers.

hansvm | karma 2835 | avg karma 1.65 · | 2020-09-18 16:01:39+00:00

That snippet as a whole could be optimized. Individual components would be trickier though. To be clear, you aren't suggesting something as seemingly straightforward as shoving lists of integers into contiguous memory, right? With native big integers, integer subclasses, and whatnot I'd imagine that'd get thorny in a hurry.

WalterBright | karma 71923 | avg karma 2.96 · | 2024-02-19 16:40:12

Yeah, except that ints in your data structures will unnecessarily consume far, far too much memory.

slazaro | karma 1200 | avg karma 3.79 · | 2022-07-22 04:47:12

Yeah that's pretty similar. I keep a list of pointers to blocks and they're all the same size, that way you can easily address elements via a linear index. I tried doing the doubling size thing like std::vector, but the addressing became more complex and I didn't find a need for it, but I might consider it in the future. But yeah, it seems like it's almost the same kind of structure. Thanks for the pointer!

the_duke | karma 16666 | avg karma 6.89 · | 2021-02-10 15:52:12+00:00

Yes, but for now only if you use custom data structures.

Custom allocator support for the std types (Vec, HashMap, etc) is coming, but at a snails pace.

reply

vvanders | karma 14177 | avg karma 4.33 · | 2017-02-17 18:53:24+00:00

How often do you really need those interesting structures(and the memory fragmentation that comes with them)? I've seen countless times where a developer reached for std::hash_map/linked_list when there will never be more than 10 values in their dataset. In that case an array would be at least as fast and much easier on your data layout.

Also if you're trying to implement lockfree data structures then safe/unsafe pointer access are going to be the least of your worries :).

reply

vvern | karma 242 | avg karma 2.57 · | 2022-01-23 09:41:41

Deques are great. I think there's two ways I'd consider designing this library differently:

1) The resizing seems worse to me than a linked-list of fixed-size ring buffers which use a sync.Pool. 2) (more out there) some of the time when I've implemented this sort of thing, it's been to allocate memory for unmarshaling. It might be nice to have a mechanism to store values and allocate pointers.

reply

aatd86 | karma 420 | avg karma 0.83 · | 2023-09-12 13:05:51

Either encode with nil or if space is not a problem, use an aditional field that stores a bitmap. Which allows to use structures with pointers without making nil semantics fuzzy.

Can even represent that bitmap as a vector and create a presence operator that is essentially a kind of intersection operation.

reply

BruceIV | karma 501 | avg karma 1.72 · | 2013-07-01 12:47:36

Thanks! What kind of performance do you get with some of those more intricate pointer structures? My own research right now is focused on how to build some common abstract data types with less pointer indirection, so as to be more easily vectorizable.