> IMO the reason is simplicity, most languages don’t want to burden programmers with lifetimes and/or soft references.
I'm talking about languages that already use a GC as the primary heap management mechanism. Any kind of GC algorithm, whether tracing or refcounting, frees developers from thinking about lifetimes, but languages that primarily rely on GC and care about performance almost always pick tracing.
The only languages that choose a refcounting GC are either those that care little about performance (Python) or those that don't rely heavily on GC (Rust).
> Both largely for reasons of predictability, lower latency, lower memory use
Nope. Refcounting offers worse latency, worse predictability [1] (compared to modern low-latency tracing GCs, like ZGC), and worse throughput; it does offer a significantly lower memory footprint. The reason Rust chose refcounting is primarily because it doesn't rely on the GC heavily and so it might as well use a crude GC (and enjoy the lower effort of implementing it as well as the lower footprint), as a crude refcounting GC is usually better than a crude tracing GC. Furthermore, an sophisticated tracing GC is not a great fit for languages like Rust or C++ because it requires a more elaborate code generation and some "magical" mutator thread behaviour, when low-level languages like the generated code to be as close to the source code as possible. This isn't so much predictability of performance, but predictability of the generated code and the behaviour of mutators (in terms of lowering the surprise of "why is my thread doing _that_ now?" when observed at a low level).
As for Swift, I think it picked a refcounting algorithm because it cares about memory footprint (most Swift applications run in memory-constrained environments) and because some unique language features allow to reduce the cost of refcounting sufficiently for a simple refcounting GC to be acceptable.
[1]: Again, not in principle (as both refcounting and tracing can converge to a similar behaviour) but as practiced by most refcounting GCs in industry use.
> In languages like C++ and Rust the majority of objects have trivial lifetimes and don't need reference counting at all.
Sure, that's why they can get away with an inefficient GC.
> Hence, even if per object the cost of RC may be higher than tracing, the fact that there are 1000x fewer objects to manage makes RC a huge win.
The "win" here is not RC at all but the fact that there's little reliance on GC and, of course, it's not really a "win" but something you buy by giving up on something else (i.e. a tradeoff), in this case abstraction ability (memory management details cannot be hidden as implementation details), and giving that up increases maintenance costs. So to get the footprint savings you have to pay -- either with performance, if you rely heavily on a refcounting GC, or with maintenance cost, if you don't. To get the performance you can choose to pay for RAM or for more development hours, but either way -- you have to pay.
Tracing GCs manage not to do any work for the vast majority of objects, those with trivial lifetimes -- they, too, don't manage the vast majority of objects (the allocator just bumps a pointer and the GC is never even aware of the existence of most of objects) -- by not freeing dead objects promptly which comes at the cost of spending more RAM.
In other words: good footprint, good performance, good abstraction (i.e. lower maintenance costs) -- pick two and pay by giving up on the third. C++/Zig/Rust pick the first two, Java/C#/Go pick the last two, and I guess Python picks the first and last.
There are, of course, other nuances; e.g. most refcounting GCs in common use leak memory as they are too crude to handle cycles, and not all tracing GCs are state-of-the-art.
> instruction pollution from all the refcount updates and their synchronization
If the language supports ownership tracking and non-synchronized Rc<> as in Rust, refcount updates ought to be rare and/or quick. I agree that this is very much an issue in languages with obligate "managed" memory such as Swift, and that tracing GC may sometimes be preferable to that.
> too much time spent freeing memory
If you're using Rc, you probably care about memory reclaim being deterministic, which tracing GC doesn't give you. You can also use arenas to free a bunch of objects all at once; this also addresses memory fragmentation and slow allocation to some extent.
> this meme needs to die. Garbage collection is not "super sophisticated reference counting", and in fact, can't even be used in most places where reference counting is used. It's not a substitution, or a superset, of reference counting.
I never said that garbage collection is sophisticated reference counting so if you've seen that "meme," it wasn't here. But reference counting and tracing are two classical approaches to garbage collection, the garbage collection literature does refer counting as a garbage collection algorithm, and refcounting is used for garbage collection in languages such as Rust. Refcounting has other uses, too, but this discussion is about garbage collection and not about 101 uses for refcounting.
Anyway, reference counting GC is not intrinsically more primitive or worse-performing than tracing, but in practice, tracing GCs tend to be sophisticated, while languages that rely on refcounting usually employ a rather simple algorithm. In fact, this is one of the attractions of refcounting for GC: it is easier to get OK performance with primitive refcounting than with primitive tracing.
> Sometimes ref-counting (Rc<T> in Rust) is fine, although that's more expensive than GC.
To nitpick: The jury's still out on that one, because Rc in Rust isn't thread-safe reference counting. I believe that non-thread-safe reference counting is quite competitive with global, cross-thread tracing GC.
When people (rightly) talk about how much slower reference counting is than tracing GC, they're almost always talking about either thread-safe tracing GC vs. thread-safe reference counting or single-threaded tracing GC vs. single-threaded reference counting. When you compare Rc to a typical multithreaded GC'd language, you're comparing multithreaded tracing GC to single-threaded reference counting, which is a much more interesting comparison.
> I wouldn't consider reference counted GC systems level appropriate. It _can_ be more deterministic, but not when it's tracing and collecting cycles (which a decent modern RC implementation must do)
Swift has what you would call an ‘indecent’ RC design, then, because it doesn’t.
> and it usually plays havoc on L1 cache (by touching counts).
Swift (like Objective-C) can store the refcounts either inline or in supplementary structures, for performance.
> Now if the garbage collection was optional
In the common case, it practically is, as (most) value types will only ever live on the stack, or inline in another data structure.
> One way to look at reference counting is as a sort of eager, optimized form of garbage collection that works in the case that you have strictly acyclic data (or can tolerate leaking cycles).
It's interesting to note that python operates with the refcounting + tracing hybrid described (and I know of production deployments where they force the tracing collector to e.g. only run once every N requests because python's never got that fast either).
perl, meanwhile, is pure refcounting but has weak references so you can make data acyclic from a refcounting POV at the cost of having to pay attention to which parts of the cyclic structure you keep references to (and for cyclic object graphs there are some tricks you can pull to move the weak link around so you don't have to pay such attention, but that would be a whole comment on its own).
koka does refcounting in theory but tries to lift as much of it to compile time as possible in practice, plus if you do e.g. 'y = x + 1' and x has only a single reference to it and is guaranteed to not be used afterwards, it re-uses the storage for y and so can do in-place mutation.
nim, meanwhile, offers something called ORC, which is automatic reference counting plus an implementation of Bacon+Rajan's Recycler algorithm which is designed to collect -only- cycles in an otherwise refcounting based system and as such is really rather fast at doing so.
Coming back to Rust: There's a stop the world Recycler implementation (rather than concurrent as intended both by the original paper and the author's future plans if the itch returns) here - https://github.com/fitzgen/bacon-rajan-cc - released as the bacon-rajan-ccc crate from this fork - https://github.com/mbartlett21/bacon-rajan-cc - and the README for those contains an Alternative section linking to other experiments in the same area.
(I've yet to play with any of the rust versions but love Recycler sufficiently as a concept that I suspect if/when I encounter a situation where it might help I'll be going back and doing so)
If you're having trouble finding the Recycler papers, I stashed a copy of each under https://trout.me.uk/gc/ (and if you're a similar sort of strange to me, you might also enjoy the ones stashed under https://trout.me.uk/lisp/ ;)
> Rust's GC (...) has worse performance [than tracing GC]
That's an unsubstantiated claim.
While reference counting may be slower than tracing in the general case, Rust's reference counting is not the same thing as general reference counting in languages like e.g. Swift or Python. There are two reasons:
- The good majority (99%+) of objects in Rust don't need to be managed by reference counting at all.
- Rust relies heavily on move semantics to avoid updating the ref-counts, so the good majority of refcounted objects have very few (sometimes just 1) refcount updates.
> reference counting gives better performance predictability than a tracing GC, but that is no longer true, certainly compared to the generational ZGC presented here.
Where does ZGC claim nanosecond latency?
But even if it did, it would be still worse, because refcounting doesn't do pauses at all. Pausing applies to pausing all application threads. Spending X nanoseconds freeing memory by one thread does not count as a pause, because the system can still make progress.
> There's a reason almost no widely used language implementation relies on ref-counting. CPython is the only real exception and they would switch to tracing if they could. They can't because they exposed deterministic finalization to the user which means now their GC strategy is a language "feature" that they're stuck with.
Reference counting and address-semantics (which prohibits moving objects, so even if you work around the refcount, you can only do a tracing GC, but you don't get to compact) are deeply ingrained into CPython's C API as well, which is very widely used.
> On the other hand reference counting is extremely slow. So slow that it puts a lot of restrictions on API design and that is definitely a mental burden.
This is very wrong on both points. Reference counting has a lot less mental burden compared to GC, because it's predictable. You can rely on destructors to actually work and be useful. For example, you don't have to keep track of whether you closed your file descriptor or not, but in GCd languages you have to do that manually, sometimes even with the same reference counting, but hand-written and explicit. Reference counting is also very fast, with the exception of concurrent decades old implementations, that are irrelevant today.
> Perhaps the biggest misconception about reference counting is that people believe it avoids GC pauses. That's not true.
When I can easily replace the deallocator (thus excluding most non-RC production GCs), I can (re)write the code to avoid GC pauses (e.g. by amortizing deallocation, destructors, etc. over several frames - perhaps in a way that returning ownership of some type and its allocations to the type's originating thread, and thus reducing contention while I'm at it!) I have done this a few times. By "coincidence", garbage generation storms causing noticable delays are suprisingly uncommon IME.
As programs scale up and consume more memory, "live data" outscales "garbage" - clever generational optimizations aside, I'd argue the former gets expensive more quickly, and is harder to mitigate.
It's also been my experience that tracing or profiling any 'ole bog standard refcounted system to find performance problems is way more easy and straightforward than dealing with the utter vulgarity of deferred, ambiguously scheduled, likely on a different thread, frequently opaque garbage collection - as found in non-refcounted garbage collection systems.
So, at best, you're technically correct here - which, to be fair, is the best kind of correct. But in practice, I think it's no coincidence that refcounting systems tend to automatically and implicitly amortize their costs and avoid GC storms in just about every workload I've ever touched, and at bare minimum, reference counting avoids GC pauses... in the code I've written... by allowing me easier opportunities to fix them when they do occur. Indirectly causal rather than directly causal.
> if you read a heap-object out of a data structure, you have to increment the object's reference count.
This isn't universal. Merely accessing and dereferencing a shared_ptr in C++ won't touch the refcount, for example - you need to copy the shared_ptr to cause that. Rust's Arc/Rc need to be clone()d to touch the refcount, and the borrow checker reduces much of the need to do such a thing defensively, "in case the heap object is removed out from under me".
Of course, it can be a problem if you bake refcounting directly into language semantics and constantly churn refcounts for basic stack variables while failing to optimize said churn away. There's a reason why many GCed languages don't use reference counting to optimize the common "no cycles" case, after all - often, someone tried it out as an obvious and low hanging "optimization", and found it was a pessimization that made overall performance worse!
And even without being baked into the language, there are of course niches where heavy manipulation of long-term storage of references will be a thing, or cases where the garbage collected version can become lock-free in a context where such things actually matter - so I'll 100% agree with you on this:
> There are no hard lines; this is about performance tradeoffs, and always will be.
> Reference counting is a form of GC, the difference being that the programmer has full control over when it runs. Most real time systems preallocate everything and never deallocate.
I really don't think Swift is suitable for real time embedded systems, if that's what you suggest. Even C++ is often unsuitable. Rust might have potential one day, if not already.
> One typical problem with GC systems is common patterns and library code cause allocations, which cause the GC to run - consuming cycles and causing jitter even if nothing is freed.
Any library code can allocate or not allocate regardless of language or memory allocation system.
> (Note that even calling malloc() is non-deterministic and best avoided in real time code.)
> Are 16/24 bytes that much worse than 8/16 (with weak refs) for ref counting?
No, but that's not the right question. Most allocations in languages like C/C++/Rust aren't using ref counts. In fact excessive use of std::shared_ptr is a well known footgun in C++!
> the ability to use non-garbage collected arenas would mitigate that issue
I agree that not using GC is a way to mitigate GC issues! But if you're only able to performantly use GC for a limited number of allocations you also aren't getting much benefit from GC.
> People recommend tracing garbage collection because it performs better than atomic reference counting, even though reference counting is much simpler.
Reference counting is a lot more complicated! It turns every read into a write.
> [...] if my goal is performance and predictable latency?
Reference counters don't have predictable latency: a single reference decrement can lead to an arbitrary amount of follow up work, like more decrements and deallocations.
Performance of reference counting isn't generally better than that of garbage collectors. (Especially since you still need a backup solution to deal with cycles.)
Figuring out a lot of your memory management at compile time is definitely a win. But doesn't really have much to do with reference counting.
> If you limit yourself to boxed and stack variables, which is a pretty big limit when writing larger programs.
That's more or less how most C or C++ programs work, however. C and C++ programs (including the Linux kernel) use reference counting when the object lifetime is truly dynamic. For example, file objects going back to the very first Unix have had to be reference counted, because they can be duplicated via the dup() system call, inherited via fork(), or sent from process to process via cmsg.
The big advantage of the C/C++/Rust approach comes from the observations that (a) the vast majority of objects only have one owner; (b) reference counting does not have a global performance impact when used only for a small subset of objects.
> And as noted by a sibling comment, you can create a memory leak due to cycles if you use it imprudently - do we consider that to be a safe use of memory?
It's a tradeoff: reference counting is bad at cycles, but has advantages due to prompt deallocation, avoidance of the mark phase, and (in Rust) avoids unnecessary cross-thread synchronization. I don't think it's unsafe, because leaks can happen in any language, including garbage-collected ones: you can have arrays in global variables that grow without limit, for example.
Remember that the criterion that GC uses—no more references to an object—is ultimately a heuristic approximating what you really want, which is whether the program will access the data in question again. Of course, due to the halting problem, we can't actually solve that problem, so all garbage collection algorithms approximate it, with the requisite possibility of leaks.
> A GC system with explicitly visible reference counts (and immediate freeing) with language support to make it easier to get the refcounts right [...]
To be a little pedantic on the subject, such a system (reference counting and immediate freeing) is a form of automatic memory management, but it is not GC in any way. Garbage collection implies that the system leaves garbage around, which needs to be collected in some way or another. The usual approach to refcounting releases resources as soon as they are no longer required (either by free()ing immediately or by sending it to a pool of unused resources), thus doesn't leave garbage around, and doesn't need a collector thread or mechanism to.
There are partial-GC implementations of refcounting, either because items are not free()d when they reach zero references, or to automatically detect reference loops which are not handled directly.
I agree with Torvalds on this matter. GC as it is promoted today is a giant step that gives programmers one benefit, solving one problem, while introducing a immeasurable pile of complexity to the system creating another pile of problems that are still not fixed today. And to fix some of these problems (like speed) you have to introduce more complexity.
This is my problem with GC. I like simplicity. Simplicity tends to perform well, and being simple also means it has little space for problems. Refcounting is simple and elegant, you just have to take care of reference loops, which also has another simple solution, that is weak references. I can teach a class of CS students everything they need to know to design a refcounting resource management system in one lesson.
GC is the opposite: it is big, complex, and a problem that the more you try to fix it, the more complex it becomes. The original idea is simple, but nobody uses the original idea because it performs so badly. To teach the same class how to design a GC system that performs as well as we expect today, an entire semester may not be enough.
> His contention is that languages with GC don't require reference count updates, and so manipulating the BDD graph is much cheaper in these languages since the resource accounting overhead is deferred by some non-deterministic time.
To be clear, the "contention" is the well-known problem that reference counting (especially atomic reference counting) is one of the most expensive automatic memory management techniques in existence, and it takes pretty significant efforts to get it on par with a modern tracing collector. That's because reference counting has high overhead and poor cache locality (as it comes with a built-in pointer chasing requirement even when you assign a reference).
That the cost is deferred has nothing to do with it.
1 - The tradeoff between latency, throughput and memory-use of tracing GCs is universal and language-independent. Some implementations may be better than the others, but I haven't seen an implementation that is good in all three aspects.
2 - I've seen this repeated on the Internet many times, but there is little scientific evidence, and the one available is quite contradictory. I recall a paper where authors argued that replacing the old-gen collector with reference counting actually improved performance. So this is definitely not universally true, and depends on many factors such as imposed constraints on memory overhead and usage patterns. It is easy to make GC good when you only create short-lived objects and allow it to use 10x more memory than needed. But things get pretty hairy when you also want low-pauses and low memory overhead and objects live longer (e.g. cache-like use cases). Also the number of reference counted objects in C++/Rust programs is typically orders of magnitude lower than the number of GCed objects in GCed languages, so even if ref-counting was less efficient per-object lifetime, it is not globally less efficient.
3 - Agree, but "value types" have been present in C/C++/Rust/Swift/D/C# and dozens of other languages for ages, so, unless you're forced to work on a legacy Java project, why wait? You can become productive in Rust in less time than they will be able to ship Valhalla.
> Basically, you attach the reference to the object graph once, and then free it when you're done with it.
So reference counting works by the programmer knowing the lifetime of each object allowing them to only increment / decrement the refcount once, and trusting that the raw uncounted pointers they use elsewhere are always valid? There's another word we have for this: manual memory management. It's unsafe and unergonomic, and it's pretty telling that the author needs to this pattern to make RC appear competitive. It's because actually doing reference counting safely is really expensive.
> If GC is so good, why wouldn't Python just garbage collect everything, which they already did once and could trivially do, instead of going through the hassle of implementing reference counting for everything but the one case I mentioned?
Because they've made reference counting a part of their C extension API and ABI. If they wanted to use a GC, they'd instead need a very different API, and then migrate all the modules to the new API. (I.e. a way for those native extension to register/unregister memory addresses containing pointers to Python objects for the GC to see.)
Early on the deterministic deallocation given by reference counting would also have been treated by programmers as a language level feature, making it so that a migration would have broken working code. But I don't think that was ever actually guaranteed in the language spec, and anyway this was not carried over to various alternative Python implementations.
> the cost of decrementing/incrementing these counters is very real
Only true if you use pervasive reference counting, with atomic updates. Swift essentially does this, but other languages don't.
One of the reasons why it makes sense to single out tracing GC (as opposed to RC) is that it's essentially a package deal; your whole program needs to be structured to use memory as essentially an in-RAM database, where every single object or reference can be traced after the fact at some arbitrary time in the future. (There are reasons to do this in some cases, even in 'low-level' languages which are not ordinarily garbage collected - see e.g. the use of "entity component systems" in such languages - but it's obviously not a generally-appropriate solution in any real sense!). RC is self-contained, and it need not even be locally inefficient.
> You're talking about gradually typed languages.
Interestingly, gradual typing does seem to end up with a "worst of both worlds"-type situation - the dynamic tag checks and conversions that have to be inserted at every boundary between statically- and dynamically-typed parts of the program introduce pervasive inefficiency, which is only resolved when static typing is used essentially throughout.
I'm talking about languages that already use a GC as the primary heap management mechanism. Any kind of GC algorithm, whether tracing or refcounting, frees developers from thinking about lifetimes, but languages that primarily rely on GC and care about performance almost always pick tracing.
The only languages that choose a refcounting GC are either those that care little about performance (Python) or those that don't rely heavily on GC (Rust).
> Both largely for reasons of predictability, lower latency, lower memory use
Nope. Refcounting offers worse latency, worse predictability [1] (compared to modern low-latency tracing GCs, like ZGC), and worse throughput; it does offer a significantly lower memory footprint. The reason Rust chose refcounting is primarily because it doesn't rely on the GC heavily and so it might as well use a crude GC (and enjoy the lower effort of implementing it as well as the lower footprint), as a crude refcounting GC is usually better than a crude tracing GC. Furthermore, an sophisticated tracing GC is not a great fit for languages like Rust or C++ because it requires a more elaborate code generation and some "magical" mutator thread behaviour, when low-level languages like the generated code to be as close to the source code as possible. This isn't so much predictability of performance, but predictability of the generated code and the behaviour of mutators (in terms of lowering the surprise of "why is my thread doing _that_ now?" when observed at a low level).
As for Swift, I think it picked a refcounting algorithm because it cares about memory footprint (most Swift applications run in memory-constrained environments) and because some unique language features allow to reduce the cost of refcounting sufficiently for a simple refcounting GC to be acceptable.
[1]: Again, not in principle (as both refcounting and tracing can converge to a similar behaviour) but as practiced by most refcounting GCs in industry use.
reply