I believe this is the place to ask a question I've been wondering about. WebAssembly currently doesn't support Goto labels and several people are pushing for it: https://github.com/WebAssembly/design/issues/796
Now I'm aware that mutually recursive functions can be gracefully used to implement state machines and that the issue of jumping into the middle of a loop from a goto seems to be prized.
The former (mutually recursive function) seem to solve that problem, not necessarily in almost a general senses, however perhaps preventing some of the most nasty things Goto can do.
I'd like to know if mutually recursive function requires the underlying machinery to provide gotos. In my view, that would fully justify them. If not, then tail-recursion might be the only thing missing from WebAssembly.
Why would you be worried about the "nasty things Goto can do" in an assembly language? IMO assemblies are about exposing the underlying machine's capabilities (be it physical or virtual) in the most straightforward way possible. I don't expect an elegant abstraction, I leave that to the higher level languages.
I don't know much about WebAssembly so I don't know if gotos have a place or not but if they're useful and the only reason for not adding them is cargo-culting "gotos are teh 3v1l" it's a bit silly.
I respectfully disagree that Web Assembly really "exposes the underlying machine capability" in any meaningful sense. The situation we are in today, is that many assembly ISAs are no longer fit for purpose. A modern C compiler will spend a lot of work recovering data-flow dependencies (e.g. SSA form) only to then remove them when outputting e.g. x86 assembly. The CPU then has to recover much of this information again for out-of-order execution etc.
IMHO, the Web Assembly folks should focus on a simple, elegant core language that is easy to define and easy for a JIT to optimise. The machine architecture, whatever that may be now or in the future, should be secondary. Personally, I would never have called it "Web Assembly" either.
Is it the case that common ISAs are too low-level? Or are contemporary CPUs too complex? Are more recent ISAs, e.g. RISC-V, better at preserving high-level structure?
In any case I guess at some point one has to drop the high-level stuff and get real.
That sounds very reasonable to me but if my (very limited) understanding of WebAssembly is correct then it's mainly meant as a target for compilers, not something you'd routinely write yourself unless you wanted very tight control over the code or maybe micro-optimizations, as such it doesn't make a lot of sense to object to certain feature from a code maintainability/language ergonomics point of view IMO.
I'm not saying anything about what WebAssembly should or shouldn't look like, I haven't got the faintest idea about that, I was just criticizing the argument of the parent about the "nastiness" of goto that felt out of place in this context. If 99.99% of all wasm code is generated by compilers then it should mostly aim at providing a target that lets compiler output efficient code, not something that would be nice for humans to write or add arbitrary limitations to prevent coders from writing "nasty" code.
> it's mainly meant as a target for compilers, not something you'd routinely write yourself
Yes absolutely. That's why they should not cave in to pressure to add semantically redundant features just to make a particular project or application slightly more convenient.
In order to meet the objectives of easy to define and easy to optimise, it will be necessary to agree on a core set of foundational primitives and stick to them.
> The machine architecture, whatever that may be now or in the future, should be secondary. Personally, I would never have called it "Web Assembly" either.
It wouldn't be that surprising if in the future we have dedicated processing units for Web Assembly. And I think that's what they're aiming at.
Just like Java had a few computers with processing units dedicated to accelerating Java bytecode. But on a much larger scale.
Unconstrained goto like you see in C is scoped out of the feature set in part because it makes safety checks and JIT/compiler implementations a lot trickier. A primary goal with wasm is to provide high security so risky stuff like that got bumped out. In this case it's about flow control in general and not just "goto", though - you can represent some uses of goto in wasm.
It's not a source language. It should be elegant, but not in the same way. An intermediate language should lend itself to mechanical synthesis and analysis (in whichever balance is most appropriate), and shouldn't worry about how writeable and readable it is to humans.
Java bytecode, for instance, is very 'simple' in a way, but no-one in their right mind writes code in Java bytecode assembly language.
There is such a thing as a bad intermediate language. These things need careful and skillful design, but shouldn't be mistaken for source languages.
When Dijkstra wrote "Go To Statement Considered Harmful", he was referring to source languages, not to machine languages or intermediate languages.
It's often said that a tail call is a "goto with arguments" [1]. It's also often said that "lambda is the ultimate X," for many different X [2]. Just adding proper tail calls to WebAssembly would do wonders for its expressiveness [3], in ways that adding lesser forms of goto would not.
[1] I can't find (or remember) who or where this originated! It seems to be folklore now, but it must have started somewhere.
I would say that schemes looping constructs are too limited for comfortable use, but that tail call elimination makes implementing your own constructs very easy. I have reimplemented racket's for loops for guile using named let loops [0] which was not only pretty easy, but it was also easily understood and optimised by the guile compiler. The end result is as fast as a hand rolled named let for almost all cases (I still haven't found one where it is measurably slower, but I have only tested it in the repl where the compiler can make lots of assumptions)
For that reason alone I think tail call elimination is a must, especially in today's transpiling world.
ES6 used to mandate TCE but it was removed. No browser other than Safari implements TCE for JavaScript. Apparently the main reason is an inaccurate stack trace on exception.
I can understand that, but it is a shame they didn't chose to do so for self-recursing functions. That would at least tell you the stack trace up until that point.
An often over-looked point: If you use tail calls to implement a loop, you aren't really interested in seeing the stack trace. Compare with a standard for-loop: you aren't interested in seeing all states of the for-loop in a stack trace.
The latest spec still talks about tail call optimization. The big objection was to replace implicit tail calls with a syntax to specify functions that might tail call.
With TCO being part of the spec and the potential replacement spec killed off, browsers really need to start getting with the program.
JavascriptCore (safari), Node v6, duktape, jerryscript, XS6, etc all have proper tail calls. It breaks the community to not implement them. It's not a standard if its not standard.
I agree, but the companies behind explicit calls dropped it, but are refusing to implement the current spec too. Their stupid politics endanger the js ecosystem.
I have re-implemented rackets for loops using named let in a way that allows me to almost reach feature parity in a weekend [0]. I could implement python's list comprehensions the same way without much fuss.
I can't imagine having to do the same using an already existing looping construct with it's own edge cases and subtle semantics.
[0]: I didn't think about post-guards when I started, and adding them now would mean having to do some restructuring of the quite hairy emitting macro, which I won't do since I don't see the use case for post guards for anything else than making in-value a bit more elegant
However, if iteration and tail-recursion are equally powerful constructs, the implication that recursion is morally superior to iteration seems misplaced.
I think the point was that recursion makes a better foundational primitive. For example, with Java and C#, their (somewhat arbitrary) Iterator and IEnumerator library interfaces are baked into the syntax of the language (e.g. the foreach loops). This is not a good position to be in.
Well the Java iterator interface has changed once already since Java was released (see the old java.util.Enumeration). The current one also has problems, it isn't clear from the API whether "hasNext" will or will not advance the stream (it often does). There are also issues with using it as the foundation for e.g. Async/Await coroutines. But, it simply cannot be changed or deprecated, as it is baked into the syntax of the language.
It depends on whether the underlying stream supports "peek". If not, in order to determine whether there is a next element, the iterator must consume the next element and hold it in local memory until a call to next.
That can matter in certain cases of large objects or slow or side-effecting stream access
My problem with "iterators" is that they violate almost all OOP practices: they're not encapsulated (their cursor state is exposed by `hasNext`, `next`, etc.), they don't manage their own state (they rely on having their `next` method called), they resist other attempts at encapsulation (using the same iterator in two places will break both, so we must know implementation details like where that iterator is used), their methods are sequentially coupled, etc. With the main reason to do this being improvement to the `for` keyword, which is structured programming not OOP. Urgh.
Far better to have something like `map` (or a `Mappable` interface or whatever), or even Javascript's `forEach` method IMHO.
Iterators (Java-style) encapsulate a forward-only cursor.
Do you feel the same way about stream abstractions? They are also cursor-based. How about network connections?
`Mappable`, `forEach` or any other mechanism that inverts control via a callback is less composable - how would you implement a `zip` function over such streams?
(I'm not a huge fan of Java's Iterator interface specifically. I marginally prefer IEnumerator from .net, which is peekable by default and doesn't have the awkward `remove()` method, which is seldom reliable through transformed iterator pipelines. And for that matter, I'm not that much of a fan of OOP; but an OOP encapsulation of a forward-only cursor isn't problematic.)
> Iterators (Java-style) encapsulate a forward-only cursor.
> Do you feel the same way about stream abstractions? They are also cursor-based. How about network connections?
It's not the implementation if Iterable I'm complaining about (e.g. whether they use a forward-only cursor) and it's not the use of Iterable to implement e.g. a stream processor over a pair of zipped streams, or whatever.
I'm specifically complaining that the Iterable interface is a very poor basis for implementing looping (which is the goalpost location set by the post title). A looping construct like `for (MyClass myElem : myCollection) { foo }` denotes that `foo` should be executed once for each element in `myCollection`, with that element bound to `myElem`. If the implementation does anything else, it's not conforming to what was denoted, which violates the principle of least surprise and can be thought of as spooky action at a distance. In particular, 'skipping' an element (i.e. where the Iterable produces an element but the loop body does not get executed for that element) indicates that the loop's state has been interfered with, i.e. it is poorly encapsulated. The loop construct specifies what to loop over, the loop body can decide to skip a given element using `continue` and `break`, and no lexically-separate code should be able to mess with that (modulo exceptions, etc. if you want those in the language).
To use this looping construct in Java, `myCollection` must be `Iterable`. It is up to the particular class what this means, e.g. what counts as an element, what counts as the next element, etc. For example, `myCollection` could contain an array, and its `Iterable` implementation might return only the odd-indexed elements of that array. That's fine, and wouldn't be a case of "skipping" since the even-indexed elements are never produced by the iterator. In other words, given such an iterator, the construct `for (MyClass myElem : myCollection) { foo }` is implicitly asking to loop over the odd-indexed elements (since what-gets-looped-over is defined by `myCollection`, and in this case we've assumed that `myCollection` only returns the odd-indexed elements).
If we aspire for the for-loop construct to behave in this way (which I would argue we should), then we find that Java's `Iterable` interface violates that encapsulation: the state of the loop can leak out via the Iterable (`myCollection`) such that code that's lexically-distinct from the loop body (e.g. methods called from the loop body, anything called by those methods, etc. as well as anything running in another thread) can call `myCollection.next()` and cause an element to be skipped.
> `Mappable`, `forEach` or any other mechanism that inverts control via a callback is less composable
The first remark I'd make here is that I'd class a `for` loop as a `mechanism that inverts control via a callback`, since a loop body is nothing other than an anonymous "callback" function with a single argument (`myElem` in my example) and dynamic (rather than lexical) scope. I would also claim that a `for` loop inverts control. The aspiration above is that `myCollection` controls what elements the loop body should be run with, yet it is not given control over the running of the loop body. Instead, the control is inverted such that the `for` construct (which wraps the loop body) controls the argument of the loop body, and the `Iterable` is merely consulted each time that `for` decides to perform an iteration (which, crucially, may occur after something else has interacted with the `Iterable`).
Secondly, I would claim that mechanisms like `forEach` are not only more composable than `for` + `Iterator`, but that `for` + `Iterator` is so uncomposable that it can be unsafe to even have two loops exist in the same program! This is a simple consequence of the fact that the state of one `for` loop can leak out via an `Iterable`, that an `Iterable` is a value which can be passed around between arbitrary parts of a program, and that one way to clobber the state of an `Iterable` is to use it in a `for` loop.
On the other hand, not only can we have as many calls to something like `myCollection.forEach` as we like in a program, but we can even nest them!
> how would you implement a `zip` function over such streams?
`Mappable` does not represent a "stream", it represents a collection whose elements can be transformed individually. Alternatively, we can think about it as representing a context within which parameterised computations can be performed (I think this second perspective makes it clearer how `Mappable` is a natural generalisation of `for`, where our computation is parameterised by the current element, compared to `Iterable` which inverts control by exposing elements to an intermediary layer (the `for`) which performs the actual computation).
On the other hand a stream (as generally understood) represents a producer of elements, or alternatively a collection of one-shot (linear) elements, or alternatively a time-varying parameter (where we can't "go back in time"). That's a separate concern to looping, where something like an Iterator may be appropriate.
If we don't require that our code propagates the guarantees of a stream, it becomes trivial to `zip` two `Mappable`s: if they're finite and small we can get away with using map/forEach to build lists of the elements of each and count how many there are, then our new `Mappable` just maps over the range `0` to `count` and invokes the given function as `f([this.list1[i], this.list2[i]])`. If our `Mappable`s are large or infinite, we can zip them by performing two maps concurrently and using a shared reference to pass accumulate the current iteration's elements from each of the maps. The concurrency could be done with asynchronous calls, threads, lazy evaluation, etc. and would be safely encapsulated inside the resulting `Mappable`.
Or, alternatively, we could use Iterators, since this is a reasonable use case for them. I'd still avoid using them with `for` though; we could use some generic Iterable-zipper to compose our streams, and use map/forEach if we ever want to loop over it. Also note that it's fine for a class to implement `Iterable` alongside `Mappable`; I'm just saying that `Iterable` does not provide a good basis for looping, and it's unfortunate that a language might force us to do it anyway.
And the whole enumerator/stream now it’s a mess in Java compared to C#.
In c# you have IEnumerable that works perfectly with LINQ without using a .stream() method call.
You don’t need to collect it explicitly, it will be just “materialised” at iteration time.
If you instead try to pass a stream, even casting it to an enumerator, and it is used twice your software will fail at runtime.
Honestly I can’t imagine how did they manage to over complicate so much Java streams implementation after that Linq was used successfully and in a very elegant way for years...
All else being equal, it's better to implement things in libraries rather than bake them into a language.
If we want to do something that requires a change to the language, e.g. the "async/await" stuff that seems to be popular these days, then we must convince the language designers that it's a good idea, wait for it to be implemented, wait for a new release of the language, and maybe (if we want our code to be portable) wait a few years for it to become widespread (e.g. new versions of RedHat, Debian, etc. to include it by default). Alternatively we could maintain a fork or patchset, which might be a lot of work and receive little support. We may also have to do this for multiple implementations, e.g. GCC + Clang + MSVC + .... We might also have to go through some form of standardisation effort (e.g. C++14, Haskell2010, etc.).
Our change may impact every user (e.g. if we add exceptions to a language, everyone must now worry about catching them, e.g. from callbacks). This impact may last for decades, even if that feature falls out of favour, since it must remain for backwards compatibility. Any subsequent language features must take our feature into account (e.g. if we added exceptions, and someone wants async/await, that person must consider how async/await deals with exceptions and vice versa).
On the other hand, if something can be implemented as a library then we don't have to wait for anyone: we can just write it. It will work with multiple implementations of the language. Those who don't want or care about our feature can just avoid the library. If the feature falls out of favour, we can just avoid importing that library in our new projects.
> it's better to implement things in libraries rather than bake them into a language.
That’s debatable and there’re also strong arguments why not. I don’t work on programming languages but I use them a lot. I generally prefer built-in features over libraries, here’s why.
Language designers are usually conservative, in a good way. If they implemented a feature into the language or into a standard library, it’s likely to stay. As a user, this gives me a good chance my code will compile by a version of compiler 10 years in the future.
Libraries are easier to fork, modify and evolve. This is good for development of language and libraries but not necessarily good for developers using them, first I need to pick one, then I need to ensure all developers on the project agree on the set of libraries and their version. Both take time. For languages with batteries included, I don’t need to worry about that.
Libraries baked into a language are of better quality, on average. They receive higher users count, the bugs are more likely to be found and fixed.
Not everything can be implemented in libraries. C++ designers made a mistake with multithreading, only fixed recently. Async-await feature is the same, note how in .NET and golang it spawns across the whole stack: libraries, compiler & language, runtime, OS kernel interface.
> Not everything can be implemented in libraries. C++ designers made a mistake with multithreading, only fixed recently. Async-await feature is the same.
Minor correction: In Haskell and F#, Async/Await can and is implemented as a library.
I said "all else being equal" for a reason. All your points stem from the fact you ommitted that from the quotation ;)
For example, built-in features are usually better quality than libraries due to survivorship bias, caused by the difficulties I listed. Libraries can also undergo scrutiny, need for buy-in, etc. and I claim that if those are equally rigorous for both a language feature and a library, then the library would be preferable.
Likewise, rather than relying on the language designers to pick a consistent set of features, we could instead rely on some other external entity to pick a consistent set of features in the form of libraries. I claim that if those two groups are equally competent, recognised, etc. then the set of libraries would be preferable.
And so on.
As an aside, regarding your preference to have exceptions baked into a language: I personally think that exceptions are dangerous and confusing-inducing (compared to Maybe, continuations, etc.). If they were a library, I could ignore them; baking them into a language makes me less inclined to use that language ;)
> built-in features are usually better quality than libraries due to survivorship bias
Yes, they’re better. No, that’s not survivorship bias, the reason is more careful selection (for third-party built-in libraries like C++ standard library) or better design (for first party libraries like .NET framework or JRE).
> and I claim that if those are equally rigorous for both a language feature and a library
I claim that’s impossible in practice, especially for powerful languages with multiple ways of doing same things. Look at C++ for an example.
> we could instead rely on some other external entity
Again, this is theory. In practice there’s no such external entities. Picking consistent features = doing design. Collective bodies, like committees or markets, are terrible at designing things.
> regarding your preference to have exceptions baked into a language
You’re probably replying to someone else’s comment?
> baking them into a language makes me less inclined to use that language
Most mainstream languages include them. You can stop programming such languages but you can’t stop using software written in them, exceptions are baked in e.g. C++ and JavaScript languages.
> No, that’s not survivorship bias, the reason is more careful selection (for third-party built-in libraries like C++ standard library) or better design (for first party libraries like .NET framework or JRE).
"More careful selection" is exactly what I meant by survivorship bias, i.e. the only language features we encounter (outside one-person pet languages) are those which managed to survive this selection process. Rejected proposals aren't as visible, and we'll never know about all those ideas which were never even proposed due to their low chance of being selected.
Libraries don't have this selection pressure, so we can see many more of bad ideas, poor implementations, throw-away experiments and learning exercises.
As for better design, this is another survivorship effect: after a feature is selected (e.g. "we're going to add exceptions"), those features go through critique and consensus-building processes ("should they work like this? how should this edge-case be handled? etc.") which weed out low-quality suggestions and (hopefully) spit out generally better designs.
Libraries don't undergo as much scrutiny and collective-ownership, unless they become popular.
> I claim that’s impossible in practice
I never claimed it wasn't. Still, I'd point to projects like Boost for C++, Rails for Ruby, Numpy/Scipy/Pandas/etc. for Python, etc. as examples of third-party libraries which offer features comparable in quality, stability, etc. to that of language built-ins (but not necessarily their own language's features; e.g. C++ is way more stringent than Boost, but Boost's API may be more stable than e.g. Nim's).
> Collective bodies, like committees or markets, are terrible at designing things.
Like programming language features? ;)
> You’re probably replying to someone else’s comment?
> I'd point to projects like Boost for C++… as examples of third-party libraries which offer features comparable in quality, stability, etc.
On Windows I tend to avoid Boost when I can. At least when I worked on Boost-based C++ projects a few years ago, boost’s quality varied a lot between different parts of the library. Also it’s huge, affects complexity of build process, may affect compilation times, by default it links to some very large dynamic libraries i.e. complicates and inflates installer. Also they have a lot of code for compatibility with older versions of the language, they support C++0x. I usually have much newer versions, always C++14 usually 17. In short, with boost I feel like I’m paying for a lot of things I don’t need.
> Like programming language features? ;)
Some languages have core design teams, who aren’t a collective body, have clear shared vision, and are therefore able to do better design. When I’m looking at what’s happening with Rust, or Golang, or C#, I see rate of improvements faster than that of C++ or Python or JavaScript.
> All else being equal, it's better to implement things in libraries rather than bake them into a language.
I was in the extensible languages camp for so long. Now I believe that widely used stuff should be part of the language. The reason is that people love to write libraries to scratch their own itch, but it takes a whip to make them write libraries that work together: use consistent naming, accept and return the same types, avoid overlaps in functionality, etc. It's just nobody's job. Except if you call your collection of libraries a "language" and try your best to harmonize them. That's what a language should be. Being minimal is no virtue.
My favorite example is Unicode: the best languages are those that bake it in and force everyone to use it. Or exception handling: you could give people the tools to build their own error handling mechanism, or you could build one good mechanism (with deep language integration, stacktraces etc) and force everyone to use it. When everything uses the same kind of strings and the same kind of errors, programmers can focus on what matters.
You can sort of get the best of both world by having a set of "blessed" libraries that are publicized as supported by the language. This allow for users to experiment and for the maintainer to select well built solutions
The problem with tail calls is that they are implicit. That's why it's usually called tail call optimization. There is nothing in the syntax to indicate that you actually have a tail call. You may have made a mistake and it actually blows up the stack.
If I remember correctly, there was briefly discussed interesting proposal to add explicit tail calls into Python almost decade ago. You would use keyword or syntax to perform tail-calls explicitly (not the tail call decorator hack). This would make tail calls actual looping construct.
I try to use the phrase "tail call optimisation" when an implementation may perform it but the language doesn't require it, and "tail call elimination" when the language requires it (and hence any implementation which doesn't is non-conforming).
Tail calls were made explicit in Clojure, with the recur construct. It had more to do with the JVM than syntactic benefits, but it turns out to be a decent idea.
Tail calls do not just deal with recursive functions, in general TCO works on any function call in tail position. Maybe a more general recur construct (maybe called "jump") would be a good idea, but then you'd have to remember to use it.
Clojure is limited by the JVM, hence the need for recur. However, between it and trampoline you get most of the way to what TCO offers, albeit with the need for the programmer to be explicit.
While there isn't special form for tailcall, they are not implicit. If the recursive call does something with the result, it's not a tailcall. Unless you are being cute with closures, it's already explicit in the AST.
--> return myself(...) # tail call
--> return var * myself(...) # not a tail call
But there is indeed something to be said about a construct that enforces the tail call guarantee. The point of SICP's scheme though is to keep the number of language constructs to a minimum for educational purposes.
That's not right, unless the "... ..." is empty, and even then, it depends on whether x is a global variable, and even if it isn't, it depends on the language.
That's definitely not a tail call, and even with a smart compiler it's pretty hard that it would get optimized into "return f(a)".
If "x" is declared there and never mutated in the subsequent code, and the code in the "..." doesn't perform any side effects, then such an optimization could occur... but then again, if the code in the "..." in your example didn't perform any side effects, it'd be dead code.
Tail calls are semantic, not just an optimization, at least in languages like Scheme. Also, blowing the stack only applies to language implementations that have a fixed stack size, it's not a universal truth. In Scheme, a call stack is an implementation detail, and some implementations dynamically allocate stack space to accommodate recursive procedures. For example, here's a simple implementation of 'map':
You might look at this and go "that's bad! it will blow the stack when the list is too long!". Well, some Schemes will dynamically grow the stack so that this straightforward code translates to an efficient process that doesn't hit arbitrary stack limits. Otherwise the programmer would have to contort this into an iterative, tail-recursive process, which would involve building the list backwards and reversing it at the end.
Apparently some implementations have tail-recursion modulo cons. You spot a tail-recursive function where the result is placed in the cdr, and you compile your code so that it chains cells by calling "set-cdr!"
For example (using CL), the code could be turned into two auxiliary map%% and map% functions (map% would be what a call to map expands into).
(defun map%% (cons function list)
(declare (optimize (speed 3) (debug 1) (safety 1))
(type function function)
(type list list)
(type cons cons))
(when list
(map%% (setf (cdr cons)
(list (funcall function (first list))))
function
(rest list))))
(defun map% (function list)
(let ((fresh (cons nil nil)))
(map%% fresh function list)
(rest fresh)))
oh, sorry, I was reading peoples comments about map. both for iteration and recursion, the compiler (and it author) has to think pretty hard. with higher order functions it just says 'great'
You can use tail recursion to create abstractions in a compositional manner that cannot be created compositionally with iterative constructs.
For example, if you can express a finite state machine (FSM) as a number of mutually (tail) recursive functions. Then you can reuse this FSM as part of a larger FSM via a tail call. You can't do this with the iterative constructs in C et al---the whole FSM must be defined in one spot.
of course you can, and trivially so; just return the next function to call from (as a pointer or some other identifier as a result of each state function.
That's called a trampoline, and is one of the ways to simulate tail recursion. It requires a global modification to the code: you have to rewrite all your code that you want to be stack-safe to be trampolined. It wouldn't call that trivial, but then Node developers are happy to rewrite their code into CPS so what do I know.
Doesn't it depend on how the iterations are compiled? Seeing how e.g. a normal continuation is compiled in C#.
foreach(var thing in Things)
{
if (thing.IsAlive)
yield return thing;
}
Those will compose (At least I think they will compose in the sense you mean) because they are alrerady sugar for a state machine and not a simple iteration. The for-loop construct didn't imply "loop here until done" in the sense that doesn't compose.
Any decent compiler will handle it just fine. If you want to include crappy compilers, well, then we need to include the same for LISP and Scheme. Whatever the language specification may claim about tail recursion is not relevant when you sit down to produce code for the actual system at hand.
This isn't a C vs Scheme (or whatever) discussion. It's about the utility of (tail) recursion vs iterative constructs. That C compilers may optimise tail recursion isn't relevant.
A Scheme compiler that doesn't do tail recursion isn't crappy; it is incorrect/nonconforming. A C compiler that doesn't do tail recursion still conforms to the abstract language semantics.
In Scheme, local functions are used for looping (via tail recursion). C doesn't even have those (other than as a GNU C extension).
Lisp idioms like (mapcar function list) replace both looping and recursion with an aggregate operation, similar to APL; it's just not obsessively condensed into a single character syntax.
I prefer looping constructs over recursion, almost every single time.
Loops are closer to what CPU actually does. A compiler is therefore simpler to implement and will likely contain less surprises.
It’s easier to use a debugger on procedural-style or OO code (step in/over/out, local variables) compared to recursive functional style advertised by SICP.
In most runtimes, call stack is limited to a small size, couple of megabytes. Stack overflow exception is hard to handle in reasonable way. When instead using a loop with a stack container for state, it can be much larger, also easier to handle overflows i.e. implement soft and hard limits.
To be fair, Scheme implements tail recursion, which converts properly prepared recursion into an iterative loop. Usually converting the recursive call into a GOTO. So if used correctly, recursion in scheme isn't a stack issue.
> which converts properly prepared recursion into an iterative loop
It only works if you are using recursion in place of a simple loop.
Many algorithms traditionally implemented with recursion (e.g. various tree searches) need that stack somewhere.
Technically it’s probably possible to “properly prepare” that, but practically both alternatives (recursion + built-in call stack, or loop + separate stack) are much easier to implement.
Explicit recursion is really a foundational primitive upon which other abstractions should be built, including looping constructs. It is similar to goto, in that it is a form of unstructured programming. Functional programmers prefer using maps and folds.
Loops are not necessarily close to what a CPU actually does either, especially one supporting vector operations or out-of-order execution. Loops imply an ordering which may or may not be actually exist and many compilers are complicated by their presence.
Debuggers are a problem for any suitably high-level abstract code. Would a step-by-step word-at-a-time debugger help to debug a SQL query or even a complex arithmetical expression from Fortran? There is no getting away from the fact that high-level domain-specific code needs high-level domain-specific debuggers to be built alongside.
I was advocating recursion as a foundational primitive for a compiled high-level or intermediate language, not as part of some hardware instruction set. Current hardware is a bit of a legacy mess, we should resist the temptation to take on its baggage.
> functional programmers prefer using maps and folds.
I know and I don’t think these are always good things.
> Loops are not necessarily close to what a CPU actually does either
Indeed, not necessarily, but very often. Unless automatic vectorization is involved, long loops are very often quite close to the machine code uploaded to CPU. Inside the CPU it’s free to do whatever, but on the public API, it’s still loop instructions.
> the fact that high-level domain-specific code needs high-level domain-specific debuggers to be built alongside.
That’s not a fact, that’s your opinion. I don’t code SQL nor Fortran, mostly C++ and C#, sometimes Python and others. The debuggers I have in corresponding IDEs are already built, they are sufficiently high-level for domain specific code, but the debuggers themselves aren’t domain specific, they are pretty general purpose.
> I know and I don’t think these are always good things.
Maps and folds do make structure explicit though. For example, it is very easy for a compiler to vectorise a "map" operation. Folds will always terminate.
> The debuggers I have in corresponding IDEs are already built, they are sufficiently high-level for domain specific code.
But does your debugger allow you to visualise the intermediate steps of an arithmetic expression? Does it understand Iterables/Enumerables and allow you to capture and inspect them? Can it handle concurrency and parallelism, or help you make sense of callbacks for non-blocking IO etc?
My point is that the debugger is simply not as general purpose as most claim it to be. It sees the world word-at-a-time and only allows you to step statement-by-statement. It does not understand the composite data structures or control structures that you have built. This is less of a problem with low-level code, as typically there is limited abstraction. But it will be a problem for any sufficiently high-level abstract code (functional or otherwise).
> it is very easy for a compiler to vectorise a "map" operation
For a general map function, it’s insanely hard problem. It’s only trivial if the map function is trivially simple e.g. `return x*2`.
> does your debugger allow you to visualise the intermediate steps of an arithmetic expression?
No, and I write my code in a way so the expressions aren’t too complex. When they grow to be that complex, I break them into multiple steps, keeping intermediate results in separate variables. BTW my primary motivation is readability, debugging is secondary.
> Does it understand Iterables/Enumerables and allow you to capture and inspect them?
I think so.
> Can it handle concurrency and parallelism, or help you make sense of callbacks for non-blocking IO etc?
Sure, debugging async-await code in .NET is almost as simple as old school non-scalable blocking IO code.
> But it will be a problem for any sufficiently high-level abstract code
Not a huge problem in practice. There’re simple ways to implement support for your custom composite data structures, e.g. natvis files in VC++. Also, in .NET you can run any code in debugger’s immediate window, while execution is paused. These simple methods aren’t effective in 100% cases, but often they are sufficient to debug, therefore saving a lot of time because no domain-specific debuggers are required.
Wow that's fantastic that Microsoft have extended the debugger to understand streams and async/await code, this of course only works because they develop both the abstractions and the debugger. It's probably not so easy for the rest of us to extend it for our own abstractions.
> It's probably not so easy for the rest of us to extend it for our own abstractions.
Depends on the abstraction. For idiomatic stuff it’s relatively easy to extend the debugger, see e.g. [1] about writing custom debug visualizers for very complex objects (for reasonably complex there’re much simpler ways).
For non-idiomatic stuff such as extending the language by modifying the compiler, or e.g. post-processing the compiled output, I don’t know because I never needed debugger support.
> Loops are closer to what CPU actually does. A compiler is therefore simpler to implement and will likely contain less surprises.
I dont find the argument convincing. By that assembly is closer to what CPU actually does, why not write code using goto (or binary)? We all can agree on its easy to write reasonably good code high-level languages. I think it because high-level languages are closer to the problem domain (or our thinking), hence it's easier to define the solution in these languages. same for recursion.
> It’s easier to use a debugger on procedural-style or OO code (step in/over/out, local variables) compared to recursive functional style advertised by SICP.
I feel exactly opposite. OO by definition have state bound to object outside the method (not actually local variable), so you have to watch all the objects method is affecting. Not pleasant experience when you have hundreds of objects. A recursive code on the other hand usually depends only on the local arguments so you can only focus on current step.
> In most runtimes, call stack is limited to a small size, a couple of megabytes. Stack overflow exception is hard to handle in a reasonable way. When instead using a loop with a stack container for the state, it can be much larger, also easier to handle overflows i.e. implement soft and hard limits.
Well, depends on the platform you are working on. For desktop at least it's definitely not "a couple of megabytes".
I feel that most of the time choice is between readability and performance. Sometimes maybe the solution is inherently iterative/recursive making choice easier. ymmv though.
> I think it because high-level languages are closer to the problem domain (or our thinking)
Mostly agree.
> same for recursion
I don’t know about you, but in my problem domains loops are closer then recursion. Also loops are closer to my thinking. YMMV.
> Not pleasant experience when you have hundreds of objects
No one forces you to design your OO apps this way. When you only have a few objects, or when you only process them sequentially, there’s no need in complex OO model, a functional or procedural code will do fine.
However, there’re apps where there’re hundreds of interacting objects, such as a web browser.
> For desktop at least it's definitely not "a couple of megabytes".
On Windows it’s usually (=for apps linked with default settings) 1 megabyte.
If you are dealing with something other than a sequence, such as a tree or a more general network, recursion is often the more insightful way to think about a solution.
Actually, the simplest example I can think of where this the case involves a sequence: binary search.
On the other hand, if you have to significantly rework the conceptual recursive solution to put it in tail-recursive form, an implementation using loops may end up being easier to understand.
Closer to the CPU isn't a valid argument. Most major programming concepts don't exist in a CPU.
* CPUs are dynamically scoped
* CPUs are weakly typed
* CPUs are dynamically typed
* CPUs only compare to zero
* CPUs have no way to represent OOP
and so on
All that said, tail calls are very close to how the CPU works. They are just GOTO with additional parameters. If you implement tail calls with dynamic scoping, the parameters would be registers and the tail call would be a change to the instruction pointer. That would be very close indeed.
But from a pragmatic standpoint, it’s just too easy for someone to turn a tail-recursive function into non-tail recursive, causing a stack overflow eventually.
So when working in a large team, with run-of-the-mill programmers, I’d stick with loops.
Actually you don't need another contruct than goto. You don't even need functions.
But for some reason given the choice, people use them.
It makes things easier to reason about for most day to day problems.
I train people in programming for a leaving. "Map" is harder to understand than "for" to most of my students. "Reduce" harder than manual looping. Recursion harder than iteration. Immutable harder than mutable.
Now I always end up teaching the entire toolbox. Those are all very useful concepts that I use in production and I do want my students to have a large solution set to their future problems.
But there is a clear pattern in my teaching experiences.
I would be really careful about making sweeping claims about teaching programming. There are many variables at play here: your students' background, how you teach, what you teach, and so on. I doubt you have controlled for all of these.
I have exactly the opposite experience teaching programming. My students grasp recursion very quickly, immutable is easy, etc. In particular, students without prior programming experience pick up these FP concepts very quickly.
>Actually you don't need another contruct than goto. You don't even need functions.
>But for some reason given the choice, people use them.
There is no choice when it come to functional programming because the paradigm is based
on Alonzo Church model of computation known as Lambda calculus which is declarative:
Everything is an expression or, more specifically, a function. Because of that, statement
like goto are forbidden. Goto is from the Alan Turing model of computation known as the Turing
machine which is imperative; everything is a statement.
Guy Steele said that the lambda expression in Scheme is the ultimate GOTO because it's
like an unconditional jump while remaining declarative and functional.
That's also my experience mentoring and working with a range of talented programmers.
Functional programming is great when you wrap your head around it, but it's a significant mental drag to understand. I haven't met people who honestly find it easier, if anything, it's more bragging rights that you can do it. (And useful bragging rights, when you can hack monads, lensing, folds, etc. you can solve some very challenging problems.)
I will make an observation in defense of functional programming: when you start to get "data pasta" (my term for the data equivalent of spaghetti code) where objects are linked all over the place and side effects abound, it becomes similarly incredibly difficult to reason about. What starts out simple quickly leads you into a trap.
That's my motivation for designing a functional language that uses typical imperative control structures, which are then translated into a functional form. I think you get a very simple mental model: imperative control, referential integrity and locality of effects. It also forbids pointers and recursive data structures; that was originally just so I could guarantee that reference counting was sufficient, but I'm now realizing that once you add a simple "path" type, you can do everything you want, but it also helps to avoid highly complex structures that are difficult to reason about.
I originally learned programming with BASIC, then Pascal, then C and assembler. So I didn't start out with map/reduce, but I find them very intuitive now. In fact, I got annoyed with a new non-imperative language I was learning, because I was looking for map and reduce, and I eventually found map but not reduce.
> It makes things easier to reason about for most day to day problems.
> I train people in programming for a leaving. "Map" is harder to understand than "for" to most of my students. "Reduce" harder than manual looping. Recursion harder than iteration. Immutable harder than mutable.
Careful! Tools that are easy for a trained expert to use and tools that are easy for a newbie to grasp are not only different sets of things — I'd go as far as to say that overlap between the two is very rare indeed.
I made it a goal of mine about 10 years ago to avoid explicit for-loops. It turns out using declarative style makes things more readable and safer and better optimized by the compiler. I can usually tell the purpose of the loop by just reading the function name (map, filter, ..) and the lambda. Sometimes parallel comes for free. Plus, all those off by one errors, iterator errors from erasing elements, etc. are all eliminated :)
For loop don't have to mean counters. We had automated for loops for years, generators and comprehensions that are basically map/filter integrated in syntax. E.g you have map and filter in python but they are rarely used for that reason.
> It turns out using declarative style makes things more readable and safer and better optimized by the compiler.
I don't think most of my loops are straightforward maps or filters of a single iterable. I need to reference various data sources and execute actions with side-effects. To do that using a functional framework, something like Haskell's mapM ("monadic map") is required. But that brings quite some mental overhead and I find it way too constraining for the typical case.
> more readable and safer and better optimized by the compiler.
I doubt these are in general qualities of non-builtin constructs vs builtin ones. But yes, too much builtin and non-orthogonal functionality leads to detrimental complexity.
In any case I don't think it matters much to most compilers whether you write a loop using syntactic sugar or as labels and jumps.
I haven’t really encountered a loop that there isn’t a function for. There ought to be some exhaustive list of all the different kinds of loops in computer science textbook. But
Most of what you need is in Zip, fold, map, reduce, filter, find, sort, reverse, transpose, permute, concat, any, all, max, min, cycle, take, ..
I’ve periodically done microbenchmarks of inner loops over the years and the various gcc outputs of the c++ loop functions are always performing slightly better than if I hand rolled. Perhaps with gcc7 or clang this is different.
Cool, so if that's all we should just use those. But wait, what about those dozens of megabytes of data, should we really zip/transpose/concat them? We should use non-strict evaluation then...
Point being, I'd rather not have this complexity. Do you mean to explicitly take each next value from the stream? And then possibly throwing all this "lazy" work away? Or memoizing the work and writing extra code that checks if the work was already done when the data is accessed? And then probably forgetting to get rid of the data when it's no longer needed?
No thanks, that's even worse. It's honest, but that doesn't make it a good idea. I like to do one thing at a time, not jump around like a wild horse. Turns out optimizing for not jumping around too much also makes programs simpler overall.
I've recently cleaned up some code of this kind. Defines objects with unsuitable lifetime and tries to be smart about reusing work instead of planning out in which order to do things in the first place. While the actual task was not difficult, the code was incomprehensible, to state it in a diplomatic way.
Not sure where you get all this idea of complexity from, lazy streams are used in functional languages a lot for data processing and they are not incomprehensible, also imperative languages like rust has iterators that are more used than for/while loops.
Well, I described the situations from which I get my ideas of complexity... Regarding iterators, I'm fine with their usage depending on what they _do_. However I don't think it's a good idea to iterate over an abstract stream that under the hood does sorts and zips and folds. Because you lose control and understanding how much intermediate data will be produced (constant small amount? 3 times the amount?), and in what order code gets executed.
And if you think consciously about the intermediate data you probably can use it for a whole lot of other things!
In Haskell, unexpected memory expansions and duplication of work can happen. And these problems are not nice to debug.
I might be generalizing too much, though. Maybe you're talking about short-lived web applications that never see a large amount of data, or something like that.
> I’ve periodically done microbenchmarks of inner loops over the years and the various gcc outputs of the c++ loop functions are always performing slightly better than if I hand rolled. Perhaps with gcc7 or clang this is different.
Any code examples?
What I was thinking of is hand-rolling
for (a; b; c;)
BODY
to
a;
loop:
if (!b)
goto out;
BODY
c;
goto loop;
out:
...
My thought about hand rolling is that any loop is hand-rolled if I tell the compiler how to do things (ie not declarative). Writing out a for loop with an s+=*p++ for summing the elements in a vector is hand rolling a “reduce”. You can add all the tricks you want like changing the stride, doing simd, prefetching, but I’ve learned to let the compiler or the math/data library do its job.
For small data (MB) the compiler or library can parallelize things nicely. Bigger data (TB+) can be chewed threw by using MPI aware libraries that optimize and schedule and everything. Database work is already declarative because SQL is just so awesome. C# and java now handle fusion and streams more nicely and c++ can in-line and r-value a lot of overhead away.
Yeah you need to be careful what kind of iteration you do. Modifying pointers is likely problematic. Iterating over a range of integers on the other hand should be easy to optimize.
In most cases, it shouldn't matter and it's much more important to think about the data accesses, due to the relative slowness of memory.
The weird thing about your pattern is that class.a() and class.b() may have side effects and don’t seem to depend on i at all. If I were to optimize this loop or even make it parallel, then I’m certainly missing some information. Some information that could be captured by another expression.
Beware the conclusions drawn from the first read through the writings of our
functional programmer forefathers (and the second, and the third), for "a
little knowledge is a dangerous thing". The unprepared mind of the imperative
programmer is especially prone to following the revelations of the old masters
before achieving real understanding.
A while ago, I was working in company X, that sold a web platform for job
boards, with multiple big and famous clients. One time, there was a bug in our
platform, that had to do with a for-loop with an always-true condition. The
loop -which was in our core platform code and so deployed in every single
website we maintained- caused untold havok, as resources disappeard down a
black hole and all our servers dropped to their knees and asked for mercy.
In the aftermath of the incident, when an explanation had to be given to those
UpStairs running the show, it was put to them that the cause was an
(effectively) unconditional loop. Now, as a lowly junior dev, I was not present in the deliberations
but, given what followed, I can imagine the conversation:
Those UpStairs: "How can we ensure that the same thing never happens again?"
Senior Software Person: "Well, there is no way to ever be 100% safe. A for-
or while-loop can always go wrong."
Those UpStairs: "Is there any alternative to for- and while-loops?"
Senior Software Person: "Not really. I mean, there is recursion of course..."
Those UpStairs: "Recursion? What is that?"
Shortly after, we had a big emergency meeting to discuss What Should be Done
to Avoid the Same Problems in the Future, where we were instructed to not use
the looping constructs of the langauge in which our platform was written,
anymore. Instead, we should only ever use recursion to write loops.
The language of our platform was C#, mind. A language that (to my knowledge,
and certainly at the time, around 4.0) does not even support tail-call
optimisation. Not to mention, it's just as easy to go into an infinite
recursive loop as it is to go into an infinite imperative loop, if not easier.
Suffice to say that nobody followed the instructions given from On High,
either because they found it too much work, or because they found it too
stupid, or because they were just busy pulling their hair out.
It's just as easy (if not easier) to make a mistake with a recursive call such that it never terminates. Why did your team suggest it as an alternative and why was it accepted? TBH the whole story seems very hard to believe.
Of course it was a bad decision. Like I say, I wasn't there when the decision was taken, but I don't think it was suggested by anyone in the tech team. Instead, what I imagine happened, is that, like in the (completely speculative) dialogue in my previous comment, someone who had no idea about what a for-loop is decided they're dangerous, asked if there is any alternative and when one of the techies present blurted out "recursion", the management person figured out it's better than for-loops.
I commend your skepticism, of course, but I don't think that's so hard to believe. I'm sure we all have stories of non-technical managers making stupid decisions based on complete misunderstanding of computers and programming languages and principles, after a quick explanation by a technical person.
I mean, imagine trying to explain recursion to someone who doesn't even really get for-loops.
I wouldn't replace loops with recursion, generally speaking. It's probably at least as hard to write and understand explicit recursion as it is a for loop, and it's still fairly prone to off-by-one errors.
Explicit loops have a use, as does explicit recursion, but in the vast majority of cases, I use higher-order functions, like map, filter, flatmap, and fold.
The real problem is choosing tools that have more power than you need. This increases the surface area for potential bugs and it requires future reader to parse more code that's just there for plumbing.
Can you explain this further? If you want to work your way through an array, for instance, you return once you have an empty array. If your result is stored in a data structure that can be appended, I don't see how it's possible to have an off-by-one error. You must be referring to another use case.
Yeah -- in some nontrivial recursive situations, you might have multiple base cases or different 0 and 1 cases. I've just often run into situations where testing my algorithm reveals that one of the cases isn't exactly right, and therefore the result doesn't match expectations. It's much more difficult to get a `map` wrong, though. What could possibly go awry? I guess the moral is, more power, more responsibility, so it's best to save power for when you absolutely need it.
C# has foreach loops, available for decades i.e. since very beginning.
C# also has LINQ, available for many years i.e. since 3.5.
Also, when management is asking “How can we ensure that the same thing never happens again?” they rarely interested in language level. In my experience, they don’t care whether the problem was caused by for or any other f-word in the code. This question is usually about QA and deployment workflows.
Luckily I had been exposed to recursion previously, what blew my mind while watching the SICP videos was the implementation of car and cdr using only lambda and the realization that the language could go without variables.
Does anyone know a good time in one's career to take time out and spend it on going through SICP and videos lectures with it?
I have a STEM (non-CS) degree but very recently, started working as a programmer. Doing my first internship at a 'techy' corporate company. One of the things I'd be doing on the side is to go through https://mitpress.mit.edu/books/elements-computing-systems. The video introduction to it just gives me goosebumps to this day. I don't know why. It's fascinating.
Than there is other paradigm books such as "pragmatic programmer", "Thinking Like A Programmer", "Design Patterns: Elements of Reusable Object-Oriented Software", etc.
So far, I've done plenty of reading on Algorithm & Design Structure and leetcode here and there. In the past, I didn't qualify for internship at Google, Amazon or M$ as a software developer/engineer. I feel the reason why I didn't was because I didn't have a "computer science" keyword mentioned anywhere on my resume. Hence, my hope is after getting through "Element of Computing System" & Design Pattern, I'll be able to shove in "computer science" keyword one way or another on my resume.
> Does anyone know a good time in one's career to take time out and spend it on going through SICP and videos lectures with it?
I guess any time is good. It's generally a class you take as a freshman or sophomore. People who attend top engineering schools generally played around with these topics in high school or even in elementary school.
> In the past, I didn't qualify for internship at Google, Amazon or M$ as a software developer/engineer. I feel the reason why I didn't was because I didn't have a "computer science" keyword mentioned anywhere on my resume.
Or you are competing against the best of the best CS students from around the world. People who aren't blown away by introductory stuff like elements of computing systems or who are just looking into SICP. I really don't think missing a keyword is why you didn't get the internship at google, amazon or microsoft.
> Hence, my hope is after getting through "Element of Computing System" & Design Pattern, I'll be able to shove in "computer science" keyword one way or another on my resume.
I've never had computer science on my resume though I majored in it. For my internships, I just listed the courses/projects I worked on. For my first job, I just listed the degrees I had ( BS and BA and the gpas for both ), along with the projects I worked on and the internships I had. After that, I just listed the university and degrees and my work experience. "Computer science" isn't something I ever listed in any resume. But then again, I was never really good at writing resumes.
I think you are really overthinking things but feel free to do what you want. If you are so intent on putting computer science on your resume, why not just major in computer science?
Now I'm aware that mutually recursive functions can be gracefully used to implement state machines and that the issue of jumping into the middle of a loop from a goto seems to be prized.
The former (mutually recursive function) seem to solve that problem, not necessarily in almost a general senses, however perhaps preventing some of the most nasty things Goto can do.
I'd like to know if mutually recursive function requires the underlying machinery to provide gotos. In my view, that would fully justify them. If not, then tail-recursion might be the only thing missing from WebAssembly.
reply