Shows and compares graphs plotting performance vs. code size, based on the programming language benchmarks at http://shootout.alioth.debian.org/. Note that 'dependability' here means 'the consistency of the performance or code size'.
Few solid conclusions can be drawn: small performance benchmarks are hardly indicative for both the code size and the performance of most actual projects in a language.
The references to Ruby aren't entirely what they seem in here. They seem to be relating to Ruby on the older shootouts, rather than the recent ones. That is, the uber-slow Ruby 1.8, rather than the Perl-beating and PHP-matching Ruby 1.9 which is more closely related to "yarv" on this list.
Its great to see Ruby so much faster with 1.9 because 1.8 speed was such a stumbling block in data intensive stuff that I pushed towards it.
However don't count your chickens just yet with regards to Ruby 1.9 being "Perl-beating" ;-) Don't you think its odd that Perl 5.10.0 sits only just above Ruby 1.8.7 in the current alioth shootout? Quick look at the tests and I can see a problem with a known 5.10.0 speed issue bug. Fixing this in a quick local test here makes all the difference ;-)
Very good graphs. Aside from X (time) and Y (code size), the center of the star implies "typical" performance, and the relative spread of the star shows specific data points. Ruby, for example, shoots out from the bottom right - to me, this says, "This is quite expressive, and if you know what you're doing, you can get reasonably good performance". It doesn't show memory usage or sample size, however - Rebol probably has significantly fewer samples than Python, for example, but it only looks like half as many.
It's also interesting how the LuaJIT star has about the same shape as the Lua star, but with 1/3 the width (which is consistent with my experience); Psyco relates similarly to Python. Surprisingly, the Lua star's center appears to be at the same X as GHC's, but lower down, and LuaJIT's overlaps OCaml's. While I can accept well-written Lua being approximately as expressive as Haskell (depending on the problem), I don't think it's in quiiiite the same ballpark as OCaml, speed-wise.
The most interesting shape, to me, is Io's: While the default performance seems pretty poor (the right edge meaning 8x as slow as the left edge), there are bands shooting all the way to the left, without adding any real height. I don't have any experience with Io (I'm curious, but have never had success porting it to OpenBSD/amd64); can anybody here comment on this?
There seems to be a mistake. If you compare Io to JRuby [1], you'll find that Io is always the slower one, even though the picture in the article suggests otherwise.
"Ruby, for example, shoots out from the bottom right - to me, this says, ... if you know what you're doing, you can get reasonably good performance" Sorry, but it does not say you can get reasonably good performance, it says this version of Ruby is slow, one of the slowest languages in the Shootout. And the graph itself says nothing about the skill level required to obtain the exhibited performance.
> And the graph itself says nothing about the skill level required to obtain the exhibited performance.
In all fairness, neither do most people promoting C++'s performance-at-any-cost design.
I think a large spread in the performance correlates with requiring skill level, though -- it means that some people using the language get good results, but that many also get very bad ones. Ruby is a really slow language ("reasonably good" < great), but it's still good enough for many purposes, provided your algorithms are reasonable.
Ruby is not a really slow language. It's got some really slow implementations. Ruby has a high cost to entry for a VM/tool developer because it has a relatively byzantine syntax which has been hard to standardize. You can write code that has the same tokens yield entirely different syntactic elements, based on something random, like, say, what day of the week it is, which you can do in Python or Smalltalk, but in Ruby you can do this by accident with statements that look very innocent. (This is covered right in the Practical Programmer's Ruby book.)
If dealing with Ruby's syntax were easier, Ruby implementations would progress faster.
That sounds more like, "its design interferes with fast implementation" to me. If there's a major difference between that and "it's slow language", I'm really curious what it is.
You missed a meta-level. The design doesn't interfere with having a fast implementation. It interferes with implementing a fast implementation.
"It's a slow language," implies that there's something special about Ruby that precludes (relatively) fast implementation. There's nothing special about Ruby like this. Ruby is a very nice remix of language features that existed before. Getting a language to be fast can require many iterations. Ruby's syntax is a high barrier to entry to implementers, so fewer eyeballs have been looking at the problem.
Smalltalk was designed to be quickly implemented by a single programmer. Implementing a "Bluebook VM" was treated as a rite of passage for support engineers for at least one of the vendors. As a result, people have implemented many, many variations on Smalltalk. (Including an VM with built-in OODB, another that can compile down to a C++ compatible DLL indistinguishable from one written in C++, another that ran on the old Palms, one with optional typing, and many others.)
I think it's fair to say that writing a Ruby VM is probably an order of magnitude harder.
Lua isn't usually associated with the functional languages. Besides mainly being promoted for embedded / extension use, I think the single biggest reason for this is that, while it has tail-call elimination, closures, first class functions, etc., you still have to say "return x" at the end of functions. I do a lot of FP-style stuff in Lua, and while the language doesn't actively resist it the way Python does, it's not the main direction its design flows.
It seems like a small thing syntactically, but languages that return the result of their last expression by default (I think of them as "expression" languages), and that require an explicit block for side-effects (begin in Scheme and OCaml, progn in Common Lisp, etc.), tend to get idiomatically used in a functional manner, while languages that default to a series of statements with side-effects and require an explicit return at the end ("statement" languages) typically do not. While you can use several of the latter languages for functional programming, only the former are functional "by default".
Saying "function(x) return x + 1 end" is a tiny bit more trouble than "fun x -> x + 1" or "(lambda (x) (+ x 1))", so while lambdas still get used where they're the best fit (as arguments for map or filter, for example), they're less common overall. Defaults influence a language's overall style at least as much as what's theoretically possible. Python's concept of what's "pythonic" seems to be the most explicit acknowledgement of this.
Incidentally, I would love to see a language that crossed Lua with the ML family: a small, portable, clean implementation that interfaces easily with C, but with type inference, ML's top-notch module system, and (ideally) Haskell-style type-classes. OCaml is such a good language that I'm willing to tolerate its warts, but I'd strongly prefer a minimal ML-like language. (FWIW, I haven't used SML, though that's just because I do most of my programming on OpenBSD/amd64 and the smlnj port is i386 only.) When standalone, Lua is best suited to small/medium projects. Its historical emphasis on embedding in an existing project has made a module system for standalone use a bit of an afterthought.* As a project gets larger, static checking and stabilizing the module interfaces becomes far more important, and this is one area the ML languages really excel.
you still have to say "return (val)" at the end of functions
We have a bunch of code that compiles to both Common Lisp and Javascript, and those nasty "return" statements are one of the two biggest mismatches between the two languages (the other being Javascript's weird semantics around null, false, 0, and ""). One of these days I intend to try inserting them all automatically.
Indeed. It's surprising how deeply eliminating return changes the language semantics.
I've postponed learning Javascript until recently, because my work hasn't usually been web-oriented, but it's interesting how similar Javascript and Lua are. They seem to have been aimed at the same target, but for historical reasons Javascript's development got frozen early, while Lua had time to iron out many similar design flaws.
(In Lua, undefined and null are both nil, nil and false are "false-y", and everything else, including 0 and "", are truthy.)
I think you hit the nail on the head earlier. It isn't "return" as such, it's expressions vs. statements. Backus got it right. What's surprising is that the superior way gets the tiny minority of usage. (Not so surprising if you know the historical reasons.)
JS is not so bad. We're lucky that what we're doing (writing high-level FP code that compiles to JS and runs in pretty much every web browser in the world) is doable at all.
JS even has its own version of progn: "(a,b,c)" evaluates to c (after evaluating a and b). We use that pretty heavily. Alas, it has two problems: there are some things you can't do inside the block (like declare new variables or have for loops), and it makes the JS harder to read and debug. Otherwise we'd probably just compile everything that way.
Can you get around (at least some of) those restrictions by wrapping things in function expressions? Here's a for loop:
((function(){ for(x in [1,2,3]){print(x)} })(), 42)
Yes, and we have some macros that do just that (a Greenspunned version of half of CL's LOOP comes to mind). But it's not great as a general solution because of what it makes you give up in performance and readability. Also, it doesn't solve the problem of declaring new variables.
Every now and then we hit upon a new abstraction that lets us remove another chunk of ugliness from our code (while still generating acceptable JS) and I'm hoping that the trend will continue, especially if we can get rid of all those "return"s.
dependability mostly is a stand in for a program making more assumptions for you no? In a language that makes a lot of assumptions its easier to create something where assumptions conflict somewhat invisibly.
If you've looked at Peter Norvig's spell checker shootout you won't be surprised much. Ex., the story on Common Lisp and Scheme implementations is the same here: MzScheme is the mediocre best of an unimpressive lot. (I want to see Clojure and Arc on this.) Python looks great again. F# looks weak here but wins big in the spell checker with reasonable-looking code. Sounds interesting.
What's the deal with Squeak? I had thought that it very concise.
The meta-message here is best of all: people are measuring language conciseness.
Well, arc's running on top of MzScheme, so it's likely to be a bit slower. Also, Stalin in the left column is a Scheme implementation, albeit one that is very restrictive for sake of raw speed.
Also, it's worth noting that the features in a language that make large systems' codebases manageable (such as good module systems) are different from the features that allow you to pare down small scripts further. It's hard to show features supporting conciseness-in-the-large in a one page benchmark, so IMHO the latter is often greatly overemphasized. Being able to knock 2 lines off 20 is often just a parlor trick, though, while 50kloc off of 200kloc can mean life or death for a project. The programming world would be a happier place if the size of every legacy C++ / Java codebase were cut by a double-digit percentage.
I don't know much about the Shootout, but my guess is that, if someone wants Arc there, they need to go and write whatever programs the Shootout requires you to write in order to participate in the Shootout.
I'd try it, but I don't know Arc (or Lisp yet, for that matter). If someone here knows Arc and wants to see it in the Shootout, I suggest they look into the faq: http://shootout.alioth.debian.org/u32q/faq.php
Just a guess, but: Squeak only uses text files as an export format, and that format is more verbose than file-based source code would normally be. If you only counted the code and not the metadata, it would probably do better.
For this sort of comparison, mostly-functional multi-paradigm languages like SBCL and Ocaml should be run through the benchmarks twice; once with imperative code, once in the functional idiom. The often-repeated complaint (of which I'm skeptical) about functional languages is that they're only efficient when ugly imperative features are used, and it would be good to confirm or dispel that suspicion.
It's more complicated than "functional is slower than imperative". Functional data structures are sometimes less immediately efficient, but add persistence (since they're immutable, they share subsections of the data structure, and/or have access to snapshots of past versions of the structure being modified). Whether this gives you needed features or just means extra work and cache misses depends on what you're doing. If you have a functional data structure and you want to add "undo" or backtracking, you're already most of the way there, while adding this to an imperative data structure means extra work to keep track of state changes.
_Purely Functional Data Structures_ by Chris Okasaki is, by far, the best collected resource, if you want to read further. (His thesis (http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf) is the kernel of the book.) Also, the fourth chapter of Developing Applications with Objective Caml (English translation: http://caml.inria.fr/pub/docs/oreilly-book/) discusses functional and imperative styles' strengths and weaknesses.
Reducing everything to runtime benchmarks obscures how, in practice, the trade-off is often more like "it was a huge pain in the ass to get working without bugs, but it's 2% faster" versus "it took twenty minutes and was correct on the first try". (Also, whether or not you use it in production, OCaml rules for prototyping complex data structures, in the same way Erlang does concurrency and C++ does linking errors.)
V8 has propelled JavaScript into the coveted bottom-left corner, which is awesome. SquirrelFish Extreme should be right up there with it, if it were included.
More reason to believe JavaScript is going to be huge on the server-side. As concise and (nearly as) powerful as Python, Perl, and Ruby, but getting faster at a much more rapid clip...and already required for the front-end, so everybody on the team is already familiar with it. Library coverage will take another couple of years to reach a reasonable level, but I'd bet on it happening.
I agree. Once I learned how truly powerful Javascript really is, once you get past the horrible DOM, I knew it could one day become a general-purpose scripting language and not just a web one. As soon as someone makes a good command-line Javascript interpreter, web apps will come as soon as someone creates a CGI extension to that interpreter.
You need to look at how many data points each language implementation actually produces. gcc runs all programs whereas java steady state (the one that looks faster than gcc) lacks a lot of data points (i.e programs). As you can see in the diagram gcc's star is pulled towards the right by one data point (at least it looks like one). If java steady state ran all programs its star might shift to the right as well. Or it might not. It's just not comparable as it is. You need to look at the actual numbers instead of the diagrams.
My own benchmarks that I run on our heavily algorithmic long running code consistently show Java about 10% to 20% slower than gcc (C++) and memory usage about twice that of the C++ implementation (excluding the JVM memory itself). C and C++ performance depends heavily on avoiding malloc/new. C/C++ code that gratuitously allocates and frees heap memory tends to be much slower than Java code.
A two line fix to compile in something like smartheap typically makes that memory allocation penalty go away. Not that it's hard to avoid unnecessary allocation/deallocation.
mlton may not allow separate compilation (it does whole-program optimization at least), and it's pretty slow even on small stuff, but if you were really using just the Standard ML language, you could use a faster compiler for development. I'd probably want to use some extensions, and I wasn't sure if the intersection of mlton's extensions with some other implementation could satisfy me.
mlton seems to be used for optimized compilation for releases, while general development is typically done in another SML environment, such as smlnj.
I don't have really concrete experience with SML, just OCaml, which also has its share of warts. It's a fantastic language in certain niches, though; generally, the sort of niches for which dynamic / "scripting" languages are usually a poor fit. (I wouldn't write a serious compiler in Python, for example.) They're very complementary.
I think the ML family has a lot of advantages, but the community is pretty closely tied to academia (much like Haskell, but without the buzz). While that in itself is probably neither good nor bad, in this case a lot of the terminology used is especially math-y; There are many features ML has that programmers would find tremendously useful, except they're usually discussed using Greek letters, which is insular. I got a copy of The Definition of Standard ML from Amazon, hoping to get further insight into the language's design, and found it completely impenetrable. (Then again, I'm not a grad student.)
Newer languages seem to be using type inference (finally!), but as far as I know, nobody has really ran with the design of its module system.
I thought this was the most telling line of the article: "Ultimately the first factor of performance is the maturity of the implementation."
That supports a common conviction held by fans of functional programming: if all of the years of arduous optimization that have been poured into GCC had instead been poured into (say) GHC, then Haskell would be even faster today than C is.
That is, to many people functional programming languages seem to have more potential for performance than lower-level procedural languages, since they give the compiler so much more to work with, and in the long run a compiler can optimize much better than a programmer. But so much more work has been put into the C-style compilers that it's hard to make a fair comparison. It's still hard, but this experiment seems to give some solace to the FP camp.
Haskell may potentially be compiled to within epsilon of C when using strict evaluation + mutable state + unboxed types (look at the shootout implementations; the majority are in this style), but lazy evaluation and functional data structures have a real cost.
It's not merely that the ghc developers have been insufficiently clever and diligent.
Few solid conclusions can be drawn: small performance benchmarks are hardly indicative for both the code size and the performance of most actual projects in a language.
reply