> In functional programming, you typically do not manipulate state directly, instead you compose functions that manipulate that state, into more complex functions. (In Haskell, for example, this composition is often done through monads.) Eventually, you get a function that represents the whole program.
A bit more detail. In Haskell this is implemented very elegantly with list fusion. If you write
map (\x -> x+1) mylist
You'll map over a list and add one to every element. this allocates a whole new list where all the elements are increased by one [0]. Now let's say we take the output of that map and map over it again, but this time we multiply every element by two:
map (\x -> x*2) (map (+1) mylist)
Naively implemented that would copy allocate two lists, one where all the numbers are incremented and another where all the incremented numbers are multiplied by two. Any good dev will know that's performance poison. So the Haskell compiler implements "list fusion" – it sees that you don't use the intermediate list for anything, and rewrites your code so it's equivalent (in semantics and performance) to:
map (\x -> (x+1) * 2) mylist)
(For the compiler devs in here, this optimization is commonly known as deforestation.) This leads to huge speedups in many cases. But there's a problem. If elsewhere you use the result of `map (\x -> x+1) mylist` for anything besides mapping over it exactly once, this optimization is impossible! So list fusion has a reputation for being fragile – the compiler has to know with 100% certainty that you aren't going to use the list for anything else, and sometimes innocent-seeming changes (like moving the intermediate map into another module) can break it.
The solution the Haskell community finds promising is to be able to give a tag to value, like a type only not quite, that says "I promise to use this value exactly once". If you use it twice or zero times, the compiler will yell at you. The compiler is still on the hook for doing the analysis of what's used once and what's used multiple times, but now the programmer and the compiler can be guaranteed to be on the same page.
As for the other issue mentioned in the original post, of modifying a subelement of a tree: this is a well-known problem and there are many solutions. If the tree is only used once, the same optimization as list fusion can be applied to mutate the list in place (although the "you must use the value only once" restriction doesn't help quite as much as you'd think it would). The more common solution, that doesn't depend on compiler optimization at all, is to use a data structure that supports this efficiently. For example, if you have a tree and you want to change one leaf on the tree, the only thing you really need to copy is the part of the tree that's actually different now - for the unchanged branches, the new tree just has a pointer to branches of the old tree, so they can be (and are) reused without copying. That's why it's very common to see maps implemented in functional languages using trees, instead of hashmaps. With a hashmap, it's much harder to get around the need to copy the whole thing when you just want to change one part.
[0]: Well, it might do that that once it gets around to it, laziness etc., but let's ignore that for now.
> The solution the Haskell community finds promising is to be able to give a tag to value, like a type only not quite, that says "I promise to use this value exactly once". If you use it twice or zero times, the compiler will yell at you.
I suspect the contortions required to fit your code into that affine type style will likely lead to code that is just as hard to understand and maintain as the equivalent imperative code (though with better static checking, which might be nice).
It's surprisingly convenient when you get used to it. A very large proportion of values are only used once, so if a function only uses a value once you can mark it as being linear in that argument and get the relevant guarantees for very little mental overhead. You can still pass any value you want to the function, the only restriction is that the function must consume the value exactly once. It's tricky to write performant functional code, but personally I find linearity tagging a small price to pay for salvation (see this article I wrote, posted here: https://news.ycombinator.com/item?id=30762281 ). (Note that GHC currently doesn't use linearity for any performance optimizations, this is just a promising possible route.)
One tidbit you might find interesting. Linear logic requires you use the value exactly once, but if you relax that requirement to "once or zero times" you get affine logic. (Values that don't have the any restriction on how much you can use them are called "exponentials".) With both linear and affine logic, you can pass an exponential to a function that's linear or affine in its argument – the only requirement is that the function uses the value in a linear or affine way. That's not quite the same as what Rust does, which I do find a bit annoying. What rust does is called "uniquenes typing", where a function can always do whatever it wants with any values it gets, but it can mark its parameters as "unique" and the caller has to make sure that any value passed as a parameter to that function is never used anywhere else. This is arguably the more useful of the two, because it means that you can mutate any argument marked as unique and no one can tell, but if you design a language around that you get Rust and I find Rust a bit less pleasant to program in than Haskell.
> For example, if you have a tree and you want to change one leaf on the tree, the only thing you really need to copy is the part of the tree that's actually different now - for the unchanged branches, the new tree just has a pointer to branches of the old tree, so they can be (and are) reused without copying.
This still only works if a child has one / a known-ahead-of-time number of parents. If you need to update an object that N objects point to, you need to update all N references.
It just doesn't really happen that much in functional programs, to be honest.
The very concept of n objects pointing to the same, commonly mutated data is something that can be adressed easily by having that data contained by a parent structure. Since functional programs don't think in terms of "methods" within each object trying to access data, but in terms of external function manipulating all the data you need, the only change is the way you'll pass your data to your functions.
A bit more detail. In Haskell this is implemented very elegantly with list fusion. If you write
You'll map over a list and add one to every element. this allocates a whole new list where all the elements are increased by one [0]. Now let's say we take the output of that map and map over it again, but this time we multiply every element by two: Naively implemented that would copy allocate two lists, one where all the numbers are incremented and another where all the incremented numbers are multiplied by two. Any good dev will know that's performance poison. So the Haskell compiler implements "list fusion" – it sees that you don't use the intermediate list for anything, and rewrites your code so it's equivalent (in semantics and performance) to: (For the compiler devs in here, this optimization is commonly known as deforestation.) This leads to huge speedups in many cases. But there's a problem. If elsewhere you use the result of `map (\x -> x+1) mylist` for anything besides mapping over it exactly once, this optimization is impossible! So list fusion has a reputation for being fragile – the compiler has to know with 100% certainty that you aren't going to use the list for anything else, and sometimes innocent-seeming changes (like moving the intermediate map into another module) can break it.The solution the Haskell community finds promising is to be able to give a tag to value, like a type only not quite, that says "I promise to use this value exactly once". If you use it twice or zero times, the compiler will yell at you. The compiler is still on the hook for doing the analysis of what's used once and what's used multiple times, but now the programmer and the compiler can be guaranteed to be on the same page.
As for the other issue mentioned in the original post, of modifying a subelement of a tree: this is a well-known problem and there are many solutions. If the tree is only used once, the same optimization as list fusion can be applied to mutate the list in place (although the "you must use the value only once" restriction doesn't help quite as much as you'd think it would). The more common solution, that doesn't depend on compiler optimization at all, is to use a data structure that supports this efficiently. For example, if you have a tree and you want to change one leaf on the tree, the only thing you really need to copy is the part of the tree that's actually different now - for the unchanged branches, the new tree just has a pointer to branches of the old tree, so they can be (and are) reused without copying. That's why it's very common to see maps implemented in functional languages using trees, instead of hashmaps. With a hashmap, it's much harder to get around the need to copy the whole thing when you just want to change one part.
[0]: Well, it might do that that once it gets around to it, laziness etc., but let's ignore that for now.
reply