Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Maybe there still should be a compiler option to automatically evaluate the degree of abstraction complexity of a code, much like cyclomatic complexity, to prevent abuses ?


sort by: page size:

Yes, but remember Sanchez's Law of Abstraction[0]: abstraction doesn't actually remove complexity, it just puts off having to deal with it.

This may be a price worth paying: transformations that actually reduce complexity are much easier to perform on an abstraction of a program than on the whole mess with all its gory details. It's just something to keep in mind.

[0] https://news.ycombinator.com/item?id=22601623


Abstractions don't remove complexity. Abstractions instead hide the appearance of complexity behind layers of ever increasing code.

I'd add that abstracting complexity should be done where it makes the rest of the system easier to understand over the abstraction. Too much abstraction can make systems harder to understand and work with instead of easier.

Yes. But I'm not sure that argument is any more valid here than it is when talking about high level programming languages vs assembly. Of course abstraction hides complexity—that's kind of the point.

I'm not saying all abstractions are bad. The problem is adding abstractions when they're not actually needed, which increases complexity instead of reducing it.

There aren't any abstractions in any language or library that don't leak everything about what they are trying to hide as well as everything about their own implementation. That's just life. It's impossible to hide complexity. Whatever wraps one thing will be strictly more complex than the wrapped thing was.

There is, ultimately, a direct conflict between abstraction and efficiency. Abstraction gets its power by using indirection: generalizations that stand in for specific cases. Making abstractions concrete involves a flattening of all these indirections. But there's a limit on our ability to automate the manipulation of symbols - we don't have Strong AI that is able to make the leaps of insight across abstraction levels necessary to do the dirty hackish work of low-level optimization. The fabled "sufficiently smart compiler" doesn't exist, and is unlikely to until we have Strong AI.

I'll further submit that a design that has clean abstractions and simple and small implementations either doesn't do much to the point that it's not very useful on its own, or if it is useful the complexity has moved somewhere else, perhaps in metadata or configuration or tooling. It's like there's a law of conservation of complexity; there is a certain amount of irreducible complexity in useful programs that cannot be removed, and remains after all accidental complexity has been eliminated.


> One solution is languages that don't allow much abstraction.

I think we need to be careful here with what you mean by "abstraction." One kind, which we can call "syntactic abstraction" means the following: given a specific language, what portion of its syntactic patterns can be factored into a new construct that "abstracts" all of its instantiations. Languages with macros score very highly here. Another kind, which has been more studied mathematically, is what we can call "algorithmic abstraction" or "behavioral abstraction", which means that given a certain specification of a desirable observable behavior, what portion of the implementations of that behavior can be exposed with the same syntactic construct (the same API). On this front, Haskell has worse abstraction than Java (e.g. if you want to change the implementation of a map to support big data with disk writes, you have to change the syntactic construct), and a language like Rust has terrible algorithmic abstraction, as even different implementation details like memory lifetimes require different constructs. Every additional piece of implementation detail that is exposed in the syntax level (e.g. in the type) hurts abstraction. I don't think that proponents of exposing as much as possible in the type think that's a bad thing, though.

Both of these kinds of abstraction allow you to "leverage existing knowledge in new domains", and both have a certain "mental" cost. I don't think it is possible to relate "abstraction" in general to any bottom-line result, especially as no language choice seems to show any big impact one way or another, but mostly cater to different personal aesthetic preferences.


Thank you for saying it, cannot agree more. Premature derivation of abstractions so often leads to code that is later difficult to modify and understand and leads to a vicious circle of adding more abstraction and complexity.

The "death by a thousand abstractions" problem must be one of the most difficult ones to solve. Each abstraction in isolation makes sense and looks like an improvement, but given a few years, the overall code becomes an inscrutable mess of layers upon layers of complexity, hiding away what it actually does.

The whole point of an abstraction is to remove complexity for the user.

So I assume you mean "implementation complexity" but that's irrelevant, because that cost only needs to be paid once, and then you put the abstraction into a crate, and then millions of people can benefit from that abstraction.


There is one major problem with -- or, rather, cost to -- zero-cost abstractions: they almost always introduce accidental complexity into the programming language originating in the technical inner workings of the compiler and how it generates code (we can say they do a bad job abstracting the compiler), and more than that, they almost always make this accidental complexity viral.

There is a theoretical reason for that. The optimal performance offered by those constructs is almost always a form of specialization, AKA partial evaluation: something that is known statically by the programmer to be true is communicated to the compiler so that it can exploit that knowledge to generate optimal machine code. But that static knowledge percolates through the call stack, especially if the compiler wants to verify — usually through some form of type-checking — that the assertion about that knowledge is, indeed, true. If it is not verified, the compiler can generate incorrect code.

Here is an example from C++ (a contrived one):

Suppose we want to write a subroutine that computes a value based on two arguments:

    enum kind { left, right };
 
    int foo(kind k, int x) { return k == left ? do_left(x) : do_right(x); }
And here are some use-sites:

    int bar(kind k) { return foo(k, random_int()); }
    int baz() { return foo(left, random_int()); }
    int qux() { return foo(random_kind(), random_int()); }
    
The branch on the kind in foo will represent some runtime cost that we deem to be too expensive. To make that “zero cost”, we require the kind to be known statically (and we assume that, indeed, this will be known statically in many callsites). In this contrived example, the compiler will likely inline foo into the caller and eliminate the branch when the caller is baz, and maybe in bar, too, if it is inlined into its caller, but let’s assume the case is more complicated, or we don’t trust the compiler, or that foo is in a shared library, or that foo is a virtual method, so we specialize with a "zero-cost abstraction":

    template<kind k> foo(int x) { return k == left ? left(x) : right(x); }
This would immediately require us to change all callsites. In the case of baz we will call foo<left>, in qux we will need to introduce the runtime branch, and in bar, we will need to propagate the zero-cost abstraction up the stack by changing the signature to template<kind k> bar(), which will employ the type system to enforce the zero-cosiness.

You see this pattern appear everywhere with these zero cost abstractions (e.g. async/await, although in that case it’s not strictly necessary; after all, all subroutines are compiled to state machines, as that is essential to the operation of the callstack — otherwise returning to a caller would not be possible, but this requires the compiler to know exactly how the callstack is implemented on a particular platform, and that increases implementation costs).

So a technical decision related to machine-code optimization now becomes part of the code, and in a very intrusive manner, even though the abstraction — the algorithm in foo — has not changed. This is the very definition of accidental complexity. Doing that change at all use sites, all to support a local change, in a large codebase is esepcially painful; it's impossible when foo, or bar, is part of a public API, as it's a breaking change -- all due to some local optimization. Even APIs become infected with accidental complexity, all thanks to zero-cost abstractions!

What is the alternative? JIT compilation! But it has its own tradeoffs... A JIT can perform much more aggressive specialization for several of reasons: 1. it can specialize speculatively and deoptimize if it was wrong; 2. it can specialize across shared-library calls, as shared libraries are compiled only to the intermediate representation, prior to JITting, and 3. it relies on a size-limited dynamic code-cache, which prevents the code-size explosion we'd get if we tried to specialize aggressively AOT; when the code cache fills, it can decide to deoptimize low-priority routines. The speculative optimizations performed by a JIT address the theoretical issue with specialization: a JIT can perform a specialization even if it cannot decisively prove that the information is, indeed, known statically (this is automatic partial evaluation).

A JIT will, therefore, automatically specialize on a per-use-site basis; where possible, it will elide the branch; if not, it will do it. It will even speculate: if at one use site (after inlining) it has so far only encountered `left` it will elide the branch, and will deoptimize if later proven wrong (it may need to introduce a guard, which, in this contrived example will negate the cost of the branch, but in more complex cases it would be a win; also there are ways to introduce cost-free guards -- e.g. by introducing reads from special addresses that will cause segmentation faults if the guard trips, a fault which is caught; OpenJDK's HotSpot does this for some kinds of guards).

For this reason, JITs also solve the trait problem on a per-use-site basis. A callsite that in practice only ever encounters a particular implementation — a monomorphic callsite — would become cost-free (by devirtualization and inlining), and those that don’t — megamorphic callsites — won’t.

So a JIT can give is the same “cost-freedom” without changing the abstraction and introducing accidental complexity. It, therefore, allows for more general abstractions that hide, rather than expose, accidental complexity. JITs have many other benefits, such as allowing runtime debugging/tracing "at full speed" but those are for a separate discussion.

Of course, a JIT comes with its own costs. For one, those automatic optimizations, while more effective than those possible with AOT compilation, are not deterministic — we cannot be sure that the JIT would actually perform them. It adds a warmup time, which can be significant for short-lived programs. It adds RAM overhead by making the runtime more complicated. Finally, it consumes more energy.

There's a similar tradeoff for tracing GC vs. precise monitoring of ownership and lifetime (Rust uses reference-counting GC, which is generally less effective than tracing, in cases where ownership and lifetime are not statically determined), but this comment is already too long.

All of these make JITs less suitable for domains that require absolute control and better determinism (you won't get perfect determinism with Rust on most platforms due to kernel/CPU effects, and not if you rely on its refcounting GC, which is, indeed more deterministic than tracing GC, but not entirely), or are designed to run in RAM- and/or energy-constrained environments — the precise domains that Rust targets. But in all other domains, I think that cost-free abstractions are a pretty big disadvantage compared to a good JIT -- maybe not every cost-free abstraction (some tracking of ownership/lifetime can be helpful even when there is a good tracing GC), but certainly the philosophy of always striving for them. A JIT replaces the zero-cost abstraction philosophy, with a zero-cost use philosophy -- where the use of a very general abstraction can be made cost-free, it (usually) will be (or close to it); when it isn't -- it won't, but the programmer doesn't need to select a different abstraction for some technical, "accidental", reason. That JITs so efficiently compile a much wider variety of languages than AOT compilers can also demonstrate how well they abstract the compiler (or, rather, the languages that employ them do).

So we have two radically different philosohies, both are very well suited for their respective problem domain, and neither is generally superior to the other. Moreover, it's usually easy to tell to which of these domains an application belongs. That's why I like both C++/Rust and Java.


If it is a bad abstraction, sure. But a good abstraction is much much easier to understand then God knows how many lines of code with God knows what program flow.

I think this is a bit of a small-scale conway's law-style effect. You shield your ability to reason about the code from the complexity of the code of others by abstracting around it, thus increasing the total complexity with an ad-hoc abstraction layer.

Abstraction should be based on whether the abstraction makes it easier or harder to reason about: to see, to predict consequences of decisions, to diagnose causes of behaviour.

Usually, the established abstractions of the domain are an excellent guide. Even if you think they are sub-optimal objectively, they are easier to reason about for experts in that field.

I like the article's language of comtaining the "damage" code can do. A C-style modularizarion technique I haven't seen used is long functions, with sections' variables' visibility limited by braces.


But computational complexity is a separate concern from level of abstraction.

True. But the quality of an abstraction is also significant.

A poor abstraction is costly in complexity. In a chain of tools, even modest complexity multiplies, and quickly becomes unmanagable. Whereas a clean abstraction (ideally) resets complexity to zero.


Agreed, with abstraction comes complexity... complexity causes problems too

I find bad abstractions to be a much greater liability than unabstracted code that could be combined into one function but isn't.

It's relatively easy to combine code and much harder to disentangle it

next

Legal | privacy