Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Administrative Scripting with Julia (github.com) similar stories update story
88 points by xgdgsc | karma 604 | avg karma 4.28 2023-04-06 22:28:35 | hide | past | favorite | 71 comments



view as:

I have created job schedulers/process managers in Julia. There are good facilities for these "systems" level tasks because Julia is based on the very portable libuv[1]. You're able to pipe processes around [2]. Interpolation into "shell command mode" has good ergonomics (like automatic safe escaping and quoting) and also solid semantics (for instance, you can write `-i$files` and it expands to `-ifile1 -ifile2` for an array [3]). I am unsure why there isn't a Cromwell [4] for Julia in the public domain yet...

[1]: https://libuv.org/

[2]: https://docs.julialang.org/en/v1/manual/running-external-pro...

[3]: https://docs.julialang.org/en/v1/manual/running-external-pro...

[4]: https://cromwell.readthedocs.io/en/stable/


Reduce Julia startup/JIT time for scripts with https://github.com/dmolina/DaemonMode.jl

I appreciate the "Why You Shouldn't Use Julia for Administrative Scripts" section[0] which asked exactly the questions I would have asked.

The choice of (non-Bash) language to write command line utilities is in a bit of odd spot right now. Python is basically almost everywhere installed but the dependency on runtime + venv oddities bring their own set of problems. Java has similar runtime needs though things might improve with initiatives regarding native binary compilation (though including the runtime may not produce exactly lightweight executables) - also, not super popular among the younger crowd. Perl used to be a hot favorite in this space but I don't think lot of people are writing new stuff in Perl even though it is still present by default almost everywhere. Go is almost perfect here except I don't want to deal with 3x the boilerplate. Personally I think Rust isn't a bad choice (libraries like clap hugely reduce the boilerplate) but the learning curve makes it a harder sell (even though for basic utilities, I don't think there would be too much wrestling with the borrow checker). Another choice that comes to mind is Nim; I think it is very well positioned except a lot of people don't know even about it so its a hard sell + even among those who know, everyone is looking at everyone else to take the initiative to adopt it in a corporate environment at a non-trivial scale.

[0]: https://github.com/ninjaaron/administrative-scripting-with-j...


I wish there was something elaborated for scripts that run on Node. I've been using nbb[1] for scripting, and although it all runs through Node.js, it is fast to start up and quick to prototype/debug scripts. The best part is in CI I can simply `npx nbb path/to/script.cljs`. Things get clunky if I want to use anything outside of the Node stdlib (or the few things that come out of the box, like tools.cli for command-line argument parsing) though, since then you need the dreaded node_modules folder around.

[1] https://github.com/babashka/nbb


Why is nbb or anything else necessary to run node scripts? Compared with a Python Node runtime does not come with all batteries included, but is rich enough to replace shell scripts. It even supports things like file system change notifiers, while Python requires external dependencies for that. So I run the scripts directly with node.

Haskell is useful to stop other people from breaking your scripts.

> Python is basically almost everywhere installed but the dependency on runtime + venv oddities bring their own set of problems

Not that it changes your core message, but if you need scripting dependencies, you should use pipx[0]. Pipx will manage isolated virtual environments for each utility. ‘pipx install black’ will create a dedicated venv just to hold black. Invoking ‘pipx upgrade-all’ will scan all venvs for out of date utilities.

[0] https://pypa.github.io/pipx/


True but then the script needs to install its dependencies dynamically from a given host and hosts may have different configurations/network restrictions etc. Also, that'll result in scripts having to include their install related boilerplate which is besides the core functionality OR we need a separate script/stage that sets it all up. Admittedly these are solved problems but it all just feels a bit more messy compared to, say, a Go utility which can be just dropped and run.

My personal rule of thumb is that if it requires something not in the Python standard library, it is no longer a script and requires a real development process.

I find Ním super interesting and feel like it has a lot of potential. I wish it was more mainstream.

I use PHP for administrative tasks on a server that runs wordpress and few other PHP apps.

My only reservation is that PHP API to start external processes are awkward. They either use a shell and require careful escaping of arguments or too low level with explicit fork/exec. I took care of that with few utilities, but for a quick replacement of shell scripts this is a significant drawback.

On the other hand modern PHP supports types out-of-the box. It turned out to be a big plus when those administrative scripts grew in size.


While it moves it from a single script to needing a composer.json and vendor directory, I find the Process component [1] has removed the footguns I used to encounter.

1. https://symfony.com/doc/current/components/process.html


Java's "runtime needs" are a lot simpler than Python's - it's very easy to create a jar that works on any JVM back to 8 (which everyone has) or even earlier - although the language is unpleasantly verbose for ad-hoc scripting. (I work in Scala most of the time and write command line utilities in it too; the JVM startup is tedious but not worth the overhead of switching to a different language). Ruby also used to be a popular option in the sysadmin space; it's pretty similar to Python but hasn't shot itself in the foot quite as vigorously on the dependency management front.

Something like Swift might be a good option in the "OCaml featureset + boring syntax + low startup overhead" space, which is what we're really looking for.


I like scala a lot but it I think it gives too many options to people that can be abused to show "cleverness" which is the last thing you want in a utility program. I stick to Scala basics and it is wonderfully concise compared to Java.

Ah yes Swift is another excellent choice. I didnt like it much originally (few years ago) but based on an HN comment recently, went back to do the introductory tutorial. It has come a long way and is very pleasant to use. If I had to guess, its close association with Apple might bring "iOS"/"mac"-only connotations that might make folks not consider it. Don't know how I forgot about it.


Slight tangent: what’s the easiest Java runtime to setup? Is there a single binary JRE that’s fully open source? Edit: and not from Oracle?

This is just an idle question I’ve had, not related in any way to Python.


Generally, I'd use Eclipse Temurin from Adoptium, if I not already in a Linux distro that has a JRE package.

I use Deno for this. It's the only scripting system where you can actually have a single file script with no compilation needed and use third party libraries reliably.

You also get great static types which is nice.


Interesting! I knew F#, C#, and Elixir supported this scenario, but good to know Deno has this as well :)

Do F# and C# really support it? How do you specify dependencies from within the script?

Yeah! F# has native support[1] and C# can do it with a third-party tool[2]. For both, the syntax for referencing a third-party NuGet package inside the script is e.g.:

    #r "nuget: Newtonsoft.Json"
[1]: https://learn.microsoft.com/en-us/dotnet/fsharp/tools/fsharp... [2]: https://github.com/dotnet-script/dotnet-script

Very interesting, thanks for the info!

Deno is pretty nice for that. There are other systems that automatically satisfy dependencies for single-file scripts. I have gathered a list at https://dbohdan.com/scripts-with-dependencies. I am not sure about their individual reliability, because I have only written one example for most of them, but some are old, have users who rely on them for work, or both.

Great list! I tried a couple of these and found a big issue is IDE support - might be worth checking which ones have IDE support.

Also it seems like F# and C# should be on the list (see other comments in this thread).


Thanks!

It didn't cross my mind to add IDE information. (I don't use an IDE, which may be a mistake. I want to give IntelliJ IDEA a serious try.) I'll be honest: I probably won't add this information. Sorry. N projects × M IDEs is a sizable number of fields to keep accurate by hand, and I've learned this kind of maintenance is best avoided.

I have tried https://github.com/dotnet-script/dotnet-script. It seemed like it could not download dependencies when you ran the script, only reference already installed dependencies. At that point I stopped and did not add it to the list, since it would not qualify. I may have been wrong. I have a mental note to look at it again.

I'll see what F# does. It may work differently from how I understood dotnet-script to work. (It isn't the criterion for inclusion, but as someone who enjoyed writing Standard ML and not so much Haskell, I am actually interested in F#.)


Yeah fair enough. Based on my experience anything that's a third party tool won't have IDE support and is really a bit useless. Even Deno is slightly annoying because it obviously conflicts with the official Typescript IDE support, so at least in VSCode you have to enable it on a workspace-by-workspace basis. And I've had one project where I had "normal" NPM-based Typescript and a Deno script, and it just doesn't work.

F# support looks really interesting though; I'm definitely going to check it out.


I was wrong about dotnet-script: it does download dependencies. (Which makes more sense than the alternative. It would need a good reason to not copy this feature from F#.) I have added C# and F# to the list.

Nim is a strange choice to come to mind while there is Lua. Much more approachable and lightweight glue language for numerous C libs and tools existing on any os. If you go in that direction than D seems like a better choice that could also come to mind. Basically C but with gc. All these are good choices for system scripting any day.

I am not saying this as an expert but I needed to set an environment variable with lua and it's not supported. You can read but not write them.(There's getenv but no setenv - i just looked it up and it's due to Ansi C not having the latter) I believe it has something to do with cross platform compatibility. This is pretty basic stuff. Maybe it's just the standard library and people just import something for it.

Are Python dependencies are problem for scripting given that you'll almost certainly use only the standard library? What can bash do for which you need a third-party library in Python?

Argument parsing with argparse is a bit more tedious than click, and the "requests" dependency is very practical for HTTP requests. You could need a YAML parser...

Babashka[0] is also a nice option for scripting, if you're a Clojure/Lisp fan.

[0] https://babashka.org/


Julia seems like a nice programming language. Is it still worth learning, though, since ChatGPT can write all software now?

Agree. Me think me learn english, but me too think ChatGPT come, then why learn English? So me not learn now, only wait for ChatGPT.

Bizarro hate Superman, but ChatGPT hate Superman better, that mean Bizarro love Superman!

> Agree. Me think me learn english, but me too think ChatGPT come, then why learn English? So me not learn now, only wait for ChatGPT.

> write above proper english

I agree. I thought about learning English, but I also thought since ChatGPT is available, why bother learning it? So I decided not to learn it now and just wait for ChatGPT.


Is it still worth posting, since ChatGPT can write all comments now?

If you know Python, you can learn Julia in a day.

Generally agree, but if you write pythonic Julia it can lead to performance issues (in particular for numerical code, which I realize is not the focus of this post). It’s taken me a while to unlearn the numpy/torch style of heavily array broadcasting instead of writing more loops and functions

So, in theory, someone who’s fairly proficient at python but never used numpy/torch could pick it up quite easily without having to unlearn things?

Just asking because I don’t want to start poking at the ML stuff but don’t have any experience/baggage to go along with it.


You can mostly convert Python code to Julia line by line (in fact ChatGPT can do it, albeit not very well since there aren't that many Julia examples in the training data). Sometimes you have to look up a function that doesn't have the same name. Writing new Julia code is often even easier than Python since you can write loops in numeric code and have good performance (also you have opt-in rigorous type checking which helps define interfaces and catch bugs, an excellent package manager, as well as sane threading/multiprocessing, contrary to Python).

However, I must caution against trying to replace Torch within Julia. The Julia ecosystem does not have these huge libraries for neutral networks yet. Building such a thing requires a huge investment (by a company like Google or Facebook) and Julia is not there yet. You can do neural networks with GPU support in Julia (even with extremely fancy autodiff capabilities) but it's not "production ready" in that you will have to deal with the quickly moving ecosystem and probably even end up contributing to it, if you stick with it long enough to build something interesting.

On the other hand, if you ever wanted to add a "neutral network term" to a PDE to simultaneously solve and train the network, Julia is the place to go. It's crazy what kinds of modeling you could potentially do with stuff like that.


Yes, I agree with the sibling poster that it’s a pretty straightforward transition, especially without array broadcasting baggage. Personally I think programming in Julia is usually more fun than python, and I think my python code has also benefited from learning Julia as well

Maybe I was too broad in my initial statements about pythonic code, because comprehensions for example work pretty much the same as in python, and they are fast. It’s just that if you’re used to mind bending array broadcasting tricks in numpy or whatever, there’s usually a more Julian way to get it done with better simplicity and performance. BenchmarkTools.jl and some of the standard library tools are also really great for getting a sense of what matters


Prompt: "Write a software that does exactly what the Product Owner wants. Here is is email *copy pasted email*"

The result was...disappointing.


[dead]

Would rather use Janet for that

https://janet-lang.org/


reason?

Not knowing about TXR Lisp, I would guess. :)

Who?

As a data scientist, I’m constantly disappointed that Julia didn’t take off the way I was hoping (and thought) that it would when I was doing some combination of machine learning and HPC as a grad student back in 2013.

It was just so pleasant and enjoyable to program in, and there were many things I was able to do with the language that would have been much more difficult in other languages (as one example, I wrote a generated function to compute the irreducible matrix representations of compact groups; Julia generated a unique function with optimized bytecode for each distinct matrix dimension).

Like the post author, I also found Julia easy to use as a general purpose programming language. I’m still hoping adoption will gain momentum at some point. Python certainly took a while to get going.


I'm really curious what a new language would need to have to start stealing users from Python in the data domain. I was also a bit surprised that Julia didn't get more traction, as I felt it was a more enjoyable experience, but obviously it doesn't have the extensive community and ecosystem that Python has so it was fighting an uphill battle from the start. Interop with Python (like R's reticulate) would be helpful when targeting Python users, so they could still draw in their favourite packages, but Julia has that already.

In my new/current role, I primarily use Python but would prefer not to; however, it's a tough case to make against Python when it's become the default data language (along with SQL).


It already has interop with python via: https://juliapackages.com/p/pythoncall. It's pretty seamless for the most part I think.

> I'm really curious what a new language would need to have to start stealing users from Python

Backing of a large tech company. I tried to get something going at the one I worked at, but it went nowhere. To my surprise a coworker who was also one of the early top contributors to Julia mentioned to me that the language was unlikely to ever take off internally because the company had already invested so many resources into improving and building upon Python.


It's not only a question of community but inertia in large companies. Where I am Python is allowed and I have it everwyhere but I'd have to jump into a lot of hoops to get a Julia permission.

Same, I thought I had found the ideal language for ML/DS/generally interesting stuff and it simply lacked people, libraries, tools, etc. but that would come in time. It seems Pythons network effect is just too large to overcome for any language.

Queryverse is just sitting there like it wasn't the easiest way to sift into data. While the ML stacks are not as popular as python options yet, FluxML still abstracts away a lot of the ugly for scaling a problem.

Have a gloriously wonderful day =)


I feel the same way. As a bioinformatician, Julia is very close to the perfect language. It addresses real everyday problems I face using Python, our lingua Franca.

So why hasn't it gotten more popular in science and engineering? It seems to have stagnated in popularity since 2020 for no particular reason - the language itself has not stagnated, and it gets better every year. I'm honestly bewildered.


Well, more people don't use it because more people don't use it. Network effect and inertia.

Sure, but the same could be said about Python. In the early 2010s, Perl still reigned supreme in my field, yet Python usurped it, despite having no concrete advantages over Python except a better design.

And of course, Perl also replaced an existing language. It's not like people settled on Fortran in the 1950s and then never moved on.


Yes, but the situation seems to be different now. Perl wasn't as widely used, especially at universities, for ML, DL, Statistics, etc. as Python is now.

I think Julia started getting popular a little too early — I had a few bad experiences and crashes (and crashes were almost impossible to debug as they gave backtracked in the compiler code). There are only so many hours to try new things.

I'm quite hopeful that language models for code might actually lead to more people getting comfortable with multiple languages and especially some kind of hopping back and forth. I think the prospect of starting off your script in Python (which is a "comfort food" for many, including myself), using LLM to translate over to Julia, probably do some minor debugging, and then taking off from a starting point instead of a blank file could be a nice workflow for people who want to dip their toes in.

Not going to happen because the majority of Python users don't know Julia exists and see no reason to switch to anything else.

If you can use LLM's to translate your code from one language to another, why would you translate Python to Julia? Even the most ardent Julia supporters will admit that Julia code will be slower than optimized C and C++ code. Why not just use an LLM to rewrite your Python into those languages?

> Julia code will be slower than optimized C and C++ code

This isn't true. Optimized Julia will pretty much always tie optimized C/C++. This shouldn't be surprising. They use the same compiler and run on the same hardware and both let you use inline assembly where needed. Octavian often beats MKL at matmul, and the Julia math library is written in Julia and doesn't lose performance from doing so.

Rewriting to Julia instead of C/C++ has the benefit that the code is still readable and improvable by scientists who wrote the code in the first place.


Yeah, that's what I'm basically thinking: for scientists who already like Python because it's readable and mostly matches the mental model of what they care about, being able to catalyze the switch to a different, faster language that still reads nicely could really drive adoption. But it's a stretch for sure!

This has been said many times, but I think it is because of not having AOT compilation as an option. Due to this, not only do we have the "Time to first plot" problem, but apparently the memory used by the process is also large. I love Julia and use it for my data and computational work. But for CLI apps and for web apps I am learning Rust because the start up speed and memory usage for CLI and webapps matters.

I'm confused by this statement because AOT compilation is an option, and it's getting better over time.

https://julialang.github.io/PackageCompiler.jl/stable/sysima...

In Julia 1.9 and beyond, this process now also becomes modular with a native library being produced per package.

The difficulty is determining what exactly must be compiled since Julia's methods are polymorphic. There's are tools that exist to help with this. One of the latest that is SnoopPrecompile.jl.


Next step:

using SHELL=julia in your makefiles.

(/s)


The example of opening files within a do block could be explained better. [1] The article implies that the file gets closed when it goes out of scope, but doesn’t explain how it works.

[1] https://github.com/ninjaaron/administrative-scripting-with-j...


Legal | privacy