Hacker Read

indeedmug · 2023-11-07 12:56:00

Wow, what a cool workflow. I looks like the interop promise of Apache Arrow is real. It's a great thing when your computer works as fast as you think as opposed to sitting around waiting for queries to finish.

nextworddev | karma 1160 | avg karma 2.28 · | 2023-04-05 14:34:10

Given how powerful Apache Arrow is, there's a lot of promise to this library.

ldng | karma 1048 | avg karma 1.32 · | 2019-02-27 11:42:05

Now combine that with Apache Arrow and you can do some interesting things.

ah- | karma 1074 | avg karma 5.0 · | 2018-12-14 08:21:48

Great to see that you're supporting Apache Arrow! That makes it so much easier to gradually switch over.

pyuser583 | karma 4441 | avg karma 1.63 · | 2020-07-27 15:32:03+00:00

Yay! Apache Arrow is great project. Whenever I use is, people think I’m some kind of genius.

It should get a lot more press.

reply

liminal | karma 1186 | avg karma 4.56 · | 2022-09-01 11:20:33

So this is an Apache Arrow database engine integrated into other databases? My main takeaway is that it's great to see more projects standardizing on Arrow and pushing it further down the stack.

dafelst | karma 922 | avg karma 5.76 · | 2021-06-21 16:22:00+00:00

I love seeing stuff like this, getting more understanding of the layers underlying high performance data analytics is super interesting to me.

This project seems very similar to Apache Arrow, if OP or anyone else is around to explain why one might be used over the other that would be great.

reply

deepsun | karma 3058 | avg karma 1.77 · | 2023-04-06 01:05:49

Apache Arrow?

As far as I understand it, it's more for cross-process in-memory fast access (e.g. numbers crunching), but it's mutable.

reply

TeMPOraL | karma 106045 | avg karma 3.04 · | 2020-03-03 22:03:57+00:00

Thank you for that. For the first time I finally understood what the Apache Arrow is - until today, I didn't realize it's just a way to do SoA in different languages + a lot of buzzwords.

jkestelyn | karma 287 | avg karma 3.38 · | 2016-02-18 19:16:18+00:00

More technical details available here:

http://blog.cloudera.com/blog/2016/02/introducing-apache-arr...

reply

seddonm1 | karma 172 | avg karma 3.66 · | 2021-03-25 07:09:36+00:00

Looks good. It is nice to see how much influence the 'Engineers Shouldn't Write ETL' post had!

With Apache Arrow (https://arrow.apache.org/) I think the future looks very bright for both of our projects. It is important to have standard open source libraries and my early experiments have shown very good performance results.

reply

tebeka | karma 383 | avg karma 5.8 · | 2020-10-28 19:48:03+00:00

Apache arrow

lambda | karma 10355 | avg karma 5.27 · | 2020-12-01 21:45:19+00:00

Since most articles with titles like this can be answered with "no", I'd like to point out for anyone reading the comments but not the whole article that the answer in this one is "yes," since Apache Arrow targets a different workload and can be considerably more efficient for that workload.

antisocial | karma 189 | avg karma 2.3 · | 2019-02-20 11:50:12

I didn't find Apache Arrow in this repo. I would like to learn more about your experience with using arrow, performance improvements and any lessons.

monstrado | karma 414 | avg karma 2.45 · | 2022-03-30 10:17:08

Is this the project you guys referenced using Apache Arrow for?

dogruck | karma 922 | avg karma 1.32 · | 2017-09-26 04:23:58+00:00

I really like this post.

Is there a list of major projects that are leveraging Apache Arrow?

reply

juujian | karma 1911 | avg karma 4.45 · | 2023-07-15 07:30:54

Apache Arrow sits in a similar niche, but it has support for multiple languages.

jmakov | karma 253 | avg karma 1.12 · | 2019-02-27 14:48:17+00:00

Doesn't Apache Arrow solve this problem partially already?

vletal | karma 921 | avg karma 3.11 · | 2021-05-05 17:49:34+00:00

Long story short: Apache Arrow defines a format for (tabular) data to allow efficient computation and easier interop and sharing data between different frameworks.

cocobro | karma 12 | avg karma 0.38 · | 2020-07-27 15:52:59

This article might help, it explains Arrow's performance benefits: https://www.dremio.com/apache-arrow-explained/