Hacker Read

laplacesdemon48 | karma 374 | avg karma 5.19 · 2021-01-18 15:59:23+00:00

I haven't heard of queryverse, thank you for that. This also brings up a good point I wanted to highlight.

I get that Julia is a young language with a growing ecosystem. But the lack of "one obvious way to do something" may scare new users away.

"I want to quickly wrangle data. Do I use Query.jl, DataFramesMeta.jl, SplitApplyCombine.jl or something else?"

"I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"

For a new R user it seems so much simpler:

1. run "library(dplyr)" 2. Google "how to XYZ in dplyr" 3. ??? 4. Profit

reply

oxinabox | karma 671 | avg karma 4.3 · 2021-01-18 10:41:45

I mean I get yout point. Julia has a bit of a Lisp's Curse http://winestockwebdesign.com/Essays/Lisp_Curse.html Writing a performant and easy to use data wrangling library for R is a bunch of work and means dealing with C/C++ etc. So few people are willing to do so, and just contribute to a small number of libraries like dplyr. (I feel like there are at least 2 other major compeditors to that in R?) Where as in julia it's really easy to write a new data wrangling library. Its just not that much work. So people: A) do it for just fun / student projects (None of those ones are though). B) do it because they have a nontrivially resolvable opinion (e.g. Queryverse has a marginally more performant but marginally harder to use system for missing data)

Nice thing about julia, especially for tabular data (thanks to Tables.jl), is everything works together. It's actually completely possible to mix and match all of those libraries in a single data processing pipeline. Which while is generally a weird thing to do, it does mean if you have a external package uses any of them it works into a pipeline of another. (One common case is that queryverse has CSVFiles.jl, but CSV.jl actually is generally faster, and you can just swap one for ther other, inside a Query.jl pipeline)

I absolutely argee this makes learning harder.

---

Also that particular example:

> "I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"

It's piping. Something would have to massively be screwed up if any of those options were more or less efficient than the others. The only question is what semantics do you want. Each is pretty opinionated about how piping should look.

reply

kazinator | karma 30751 | avg karma 1.78 · 2021-01-18 17:57:45+00:00

The Lisp Curse was written by then inexperienced web developer, with (then, and likely now still) zero Lisp experience, based on extrapolating something he read about Lisp in an essay by Mark Tarver. He prefers it not be submitted to HN due to the embarrassment, yet for some reason keeps the article up (probably because it generates traffic).

phonebucket | karma 1654 | avg karma 5.01 · 2021-01-18 13:09:31

> For a new R user it seems so much simpler:

> 1. run "library(dplyr)" 2. Google "how to XYZ in dplyr" 3. ??? 4. Profit

I beg to differ here. There’s much to be said for using data.table and base R instead of the tidyverse.

This article is worth a read in my view: https://github.com/matloff/TidyverseSkeptic

reply

disgruntledphd2 | karma 6481 | avg karma 1.73 · 2021-01-18 13:54:23

Yeah, NSE (non-standard evaluation) is really annoying to work with in dplyr/tidyverse codebases, and this definitely inhibits people from building on top of them.

They are an 80% solution for a lot of data analytic needs, but base-R is 100% the right choice if you want your code to run for a long time without needing updates.

I've never really gotten into data.table for some reason, normally dplyr is fast enough, or I'm using something more efficient than R.

reply