What you think of as "vector" processing is currently being used by compilers to speed up things you didn't think were vectorizable. This is possible only because these instructions are pretty cheap latency-wise. By introducing huge latency, you'd be ruining performance of autovectorization, which accounts for a lot of the performance gains in the past decade.
It seems like you're saying you'd rather have a slower implementation given that a bunch of single instructions useful for this sort of thing aren't available in the Vector API and must be built from sequences of Vector methods that themselves must be implemented using multiple instructions.
By vector operations do you mean using something like the Accelerate framework? or SSE/NEON primitives? or just retooling your code so that your compiler can make attempts to vectorize when possible?
It's not the vector instructions, it's the careful scheduling of instructions to spend just enough time manipulating pointers when you want to crunch actual data. All while respecting dependency chains and memory stall times. (Hyperthreading helps a lot with the latter, see Nvidia Maxas (nervana systems now) for details on how flexible number of threads benefit weighing of memory load stall hiding vs. register pressure causing more data shuffling.
Unfortunately, the complexity argument is generally bullshit and you really need to profile. It turns out multiple very respected authors have found under a typical work load, `vector` performs very well on a lot of machines.
[citation needed, in the form of actual benchmarks]
The thing is that list fusion and whatnot is all just there to get around the handicap that was placed there in the first place by the language paradigm. So you start by insisting on shooting yourself in the foot, then put lots of armor on your boot so the bullet hopefully bounces off.
I assume by "vectors" you mean arrays ... there is no case in which this can be faster than arrays, because in the limit, if the list fusion system works perfectly, it is just making an array. A thing can't be faster than itself.
Vector can be slow if you create and destroy them a lot, since they allocate. You can work around to some extent by providing a custom allocator, but using something like SmallVector or absl::InlinedVector can be much faster when the N is known.
Vector API is very far from being hardware intrinsics. Unfortunately for Java programmers, it’s merely a least common denominator of SIMD instructions across different ISAs. This makes the feature very limited by design, IMO borderline useless.
Cool to see languages besides C running on small hardware.
I would guess that memory consumption, not speed, is the limiting factor vs. C. I skimmed through the source code and couldn't find a way to define heterogeneous packed data types (i.e. structs). That would be a serious turn-off for me. Cons cells are a lot of overhead. At least it has vectors.
Cool to see languages besides C running on small hardware.
I would guess that memory consumption, not speed, is the limiting factor vs. C. I skimmed through the source code and couldn't find a way to define heterogeneous packed data types (i.e. structs). That would be a serious turn-off for me. Cons cells are a lot of overhead. At least it has vectors.
reply