Wow, what a cool workflow. I looks like the interop promise of Apache Arrow is real. It's a great thing when your computer works as fast as you think as opposed to sitting around waiting for queries to finish.
So this is an Apache Arrow database engine integrated into other databases? My main takeaway is that it's great to see more projects standardizing on Arrow and pushing it further down the stack.
Thank you for that. For the first time I finally understood what the Apache Arrow is - until today, I didn't realize it's just a way to do SoA in different languages + a lot of buzzwords.
Looks good. It is nice to see how much influence the 'Engineers Shouldn't Write ETL' post had!
With Apache Arrow (https://arrow.apache.org/) I think the future looks very bright for both of our projects. It is important to have standard open source libraries and my early experiments have shown very good performance results.
Since most articles with titles like this can be answered with "no", I'd like to point out for anyone reading the comments but not the whole article that the answer in this one is "yes," since Apache Arrow targets a different workload and can be considerably more efficient for that workload.
Long story short: Apache Arrow defines a format for (tabular) data to allow efficient computation and easier interop and sharing data between different frameworks.
reply