Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Wow, what a cool workflow. I looks like the interop promise of Apache Arrow is real. It's a great thing when your computer works as fast as you think as opposed to sitting around waiting for queries to finish.


sort by: page size:

Given how powerful Apache Arrow is, there's a lot of promise to this library.

Now combine that with Apache Arrow and you can do some interesting things.

Great to see that you're supporting Apache Arrow! That makes it so much easier to gradually switch over.

Yay! Apache Arrow is great project. Whenever I use is, people think I’m some kind of genius.

It should get a lot more press.


So this is an Apache Arrow database engine integrated into other databases? My main takeaway is that it's great to see more projects standardizing on Arrow and pushing it further down the stack.

I love seeing stuff like this, getting more understanding of the layers underlying high performance data analytics is super interesting to me.

This project seems very similar to Apache Arrow, if OP or anyone else is around to explain why one might be used over the other that would be great.


Apache Arrow?

As far as I understand it, it's more for cross-process in-memory fast access (e.g. numbers crunching), but it's mutable.


Thank you for that. For the first time I finally understood what the Apache Arrow is - until today, I didn't realize it's just a way to do SoA in different languages + a lot of buzzwords.


Looks good. It is nice to see how much influence the 'Engineers Shouldn't Write ETL' post had!

With Apache Arrow (https://arrow.apache.org/) I think the future looks very bright for both of our projects. It is important to have standard open source libraries and my early experiments have shown very good performance results.


Apache arrow

Since most articles with titles like this can be answered with "no", I'd like to point out for anyone reading the comments but not the whole article that the answer in this one is "yes," since Apache Arrow targets a different workload and can be considerably more efficient for that workload.

I didn't find Apache Arrow in this repo. I would like to learn more about your experience with using arrow, performance improvements and any lessons.

Is this the project you guys referenced using Apache Arrow for?

I really like this post.

Is there a list of major projects that are leveraging Apache Arrow?


Apache Arrow sits in a similar niche, but it has support for multiple languages.

Doesn't Apache Arrow solve this problem partially already?

Long story short: Apache Arrow defines a format for (tabular) data to allow efficient computation and easier interop and sharing data between different frameworks.

This article might help, it explains Arrow's performance benefits: https://www.dremio.com/apache-arrow-explained/
next

Legal | privacy