Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> instruction pollution from all the refcount updates and their synchronization

If the language supports ownership tracking and non-synchronized Rc<> as in Rust, refcount updates ought to be rare and/or quick. I agree that this is very much an issue in languages with obligate "managed" memory such as Swift, and that tracing GC may sometimes be preferable to that.

> too much time spent freeing memory

If you're using Rc, you probably care about memory reclaim being deterministic, which tracing GC doesn't give you. You can also use arenas to free a bunch of objects all at once; this also addresses memory fragmentation and slow allocation to some extent.



sort by: page size:

> IMO the reason is simplicity, most languages don’t want to burden programmers with lifetimes and/or soft references.

I'm talking about languages that already use a GC as the primary heap management mechanism. Any kind of GC algorithm, whether tracing or refcounting, frees developers from thinking about lifetimes, but languages that primarily rely on GC and care about performance almost always pick tracing.

The only languages that choose a refcounting GC are either those that care little about performance (Python) or those that don't rely heavily on GC (Rust).

> Both largely for reasons of predictability, lower latency, lower memory use

Nope. Refcounting offers worse latency, worse predictability [1] (compared to modern low-latency tracing GCs, like ZGC), and worse throughput; it does offer a significantly lower memory footprint. The reason Rust chose refcounting is primarily because it doesn't rely on the GC heavily and so it might as well use a crude GC (and enjoy the lower effort of implementing it as well as the lower footprint), as a crude refcounting GC is usually better than a crude tracing GC. Furthermore, an sophisticated tracing GC is not a great fit for languages like Rust or C++ because it requires a more elaborate code generation and some "magical" mutator thread behaviour, when low-level languages like the generated code to be as close to the source code as possible. This isn't so much predictability of performance, but predictability of the generated code and the behaviour of mutators (in terms of lowering the surprise of "why is my thread doing _that_ now?" when observed at a low level).

As for Swift, I think it picked a refcounting algorithm because it cares about memory footprint (most Swift applications run in memory-constrained environments) and because some unique language features allow to reduce the cost of refcounting sufficiently for a simple refcounting GC to be acceptable.

[1]: Again, not in principle (as both refcounting and tracing can converge to a similar behaviour) but as practiced by most refcounting GCs in industry use.


> I wouldn't consider reference counted GC systems level appropriate. It _can_ be more deterministic, but not when it's tracing and collecting cycles (which a decent modern RC implementation must do)

Swift has what you would call an ‘indecent’ RC design, then, because it doesn’t.

> and it usually plays havoc on L1 cache (by touching counts).

Swift (like Objective-C) can store the refcounts either inline or in supplementary structures, for performance.

> Now if the garbage collection was optional

In the common case, it practically is, as (most) value types will only ever live on the stack, or inline in another data structure.


> In languages like C++ and Rust the majority of objects have trivial lifetimes and don't need reference counting at all.

Sure, that's why they can get away with an inefficient GC.

> Hence, even if per object the cost of RC may be higher than tracing, the fact that there are 1000x fewer objects to manage makes RC a huge win.

The "win" here is not RC at all but the fact that there's little reliance on GC and, of course, it's not really a "win" but something you buy by giving up on something else (i.e. a tradeoff), in this case abstraction ability (memory management details cannot be hidden as implementation details), and giving that up increases maintenance costs. So to get the footprint savings you have to pay -- either with performance, if you rely heavily on a refcounting GC, or with maintenance cost, if you don't. To get the performance you can choose to pay for RAM or for more development hours, but either way -- you have to pay.

Tracing GCs manage not to do any work for the vast majority of objects, those with trivial lifetimes -- they, too, don't manage the vast majority of objects (the allocator just bumps a pointer and the GC is never even aware of the existence of most of objects) -- by not freeing dead objects promptly which comes at the cost of spending more RAM.

In other words: good footprint, good performance, good abstraction (i.e. lower maintenance costs) -- pick two and pay by giving up on the third. C++/Zig/Rust pick the first two, Java/C#/Go pick the last two, and I guess Python picks the first and last.

There are, of course, other nuances; e.g. most refcounting GCs in common use leak memory as they are too crude to handle cycles, and not all tracing GCs are state-of-the-art.


> Languages like Swift and Objective-C hide the reference counting for you, so 'instruction pollution' is not really something I care about. The overhead at the instruction & memory level is fairly minimal for these language too, by means of using tagged pointers. I'm pretty sure the compiler is smart enough to factor out reference counting for complete sections where it can determine objects can impossibly go out of scope (e.g. sections where aliasing can be ruled out, and all assignments are to locals, such as in many loops).

New Swift Ownership API can help, but compiler is not that good to figure out exclusive ownership all by themselves without any annotation. It is common to have more than 10% of you time in RC environment spent on refcount calculations and locks acquisition (some stats: http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf).


>> - no sharing across threads, otherwise locking is required for refcount updates >That is not true. Most atomic refcount implementations are lockfree (you do need synchronization though). This optimization has nothing to do with tracing. Given that you also need synchronization for tracing garbage collection (including frequent stop the world pauses in most cases, albeit brief ones), I don't think this is even really an advantage of tracing at all.

Not sure I agree with that. Copying references between local variables and reads/writes from heap all require expensive atomic operations for RC when they can't be optimized away. That's a major performance problem for languages like Swift. I do not say that one is better than the other, but this is exactly where tracing GC's shine compared to RC.

In the case of ZGC these doesn't require atomic operations, you need a read barrier for reading references from the heap though. But do not conflate tracing GC's read & write barriers with atomic barriers.


> the cost of decrementing/incrementing these counters is very real

Only true if you use pervasive reference counting, with atomic updates. Swift essentially does this, but other languages don't.

One of the reasons why it makes sense to single out tracing GC (as opposed to RC) is that it's essentially a package deal; your whole program needs to be structured to use memory as essentially an in-RAM database, where every single object or reference can be traced after the fact at some arbitrary time in the future. (There are reasons to do this in some cases, even in 'low-level' languages which are not ordinarily garbage collected - see e.g. the use of "entity component systems" in such languages - but it's obviously not a generally-appropriate solution in any real sense!). RC is self-contained, and it need not even be locally inefficient.

> You're talking about gradually typed languages.

Interestingly, gradual typing does seem to end up with a "worst of both worlds"-type situation - the dynamic tag checks and conversions that have to be inserted at every boundary between statically- and dynamically-typed parts of the program introduce pervasive inefficiency, which is only resolved when static typing is used essentially throughout.


> Sometimes ref-counting (Rc<T> in Rust) is fine, although that's more expensive than GC.

To nitpick: The jury's still out on that one, because Rc in Rust isn't thread-safe reference counting. I believe that non-thread-safe reference counting is quite competitive with global, cross-thread tracing GC.

When people (rightly) talk about how much slower reference counting is than tracing GC, they're almost always talking about either thread-safe tracing GC vs. thread-safe reference counting or single-threaded tracing GC vs. single-threaded reference counting. When you compare Rc to a typical multithreaded GC'd language, you're comparing multithreaded tracing GC to single-threaded reference counting, which is a much more interesting comparison.


> Reference counting is a form of GC, the difference being that the programmer has full control over when it runs. Most real time systems preallocate everything and never deallocate.

I really don't think Swift is suitable for real time embedded systems, if that's what you suggest. Even C++ is often unsuitable. Rust might have potential one day, if not already.

> One typical problem with GC systems is common patterns and library code cause allocations, which cause the GC to run - consuming cycles and causing jitter even if nothing is freed.

Any library code can allocate or not allocate regardless of language or memory allocation system.

> (Note that even calling malloc() is non-deterministic and best avoided in real time code.)

Yeah, malloc is a pretty slow thing.


Garbage collected languages are still better-suited for certain things. You can make most of those things work well enough in Rust by using Rc<>, but a) it's really clunky, and b) reference-counting has caveats, like the fact that it doesn't work for circular references. Swift solves (a), but not (b). On top of that, for programs where you need to constantly allocate/deallocate, I believe (could be wrong on this) that amortized over time, sweeping GC is quite a bit more efficient than reference counting. This is extra relevant once concurrency gets involved, because now your reference-counters have to use an atomic or locking mechanism.

In short: I'm pretty sure GC isn't going anywhere, even if some of its usecases start to get covered by non-GC'd languages.


> If the ref counting is manual or optimized, it doesn't happen after every assignment, only ones which change object ownership.

The Rust compiler does this. Even so, 19% of the binary size in rustc is adjusting reference counts.

I am not exaggerating this. One-fifth of the code in the binary is sitting there wasted adjusting reference counts. This is much of the reason we're moving to tracing garbage collection.

All of those optimizations that you're talking about (for example, packing reference counts into the class pointer) will make adjusting reference counts take even more code size.

> As for not allowing cycles, I consider that a feature. The headache of memory management doesn't go away with GC.

The fact that memory management is still an issue doesn't mean that we shouldn't make it work as well as we can.

In large software projects, cycles are a persistent issue. Gecko had to add a backup cycle collector over its reference counting framework, and observed memory usage decreased significantly once it was implemented.


> Rust's GC (...) has worse performance [than tracing GC]

That's an unsubstantiated claim.

While reference counting may be slower than tracing in the general case, Rust's reference counting is not the same thing as general reference counting in languages like e.g. Swift or Python. There are two reasons:

- The good majority (99%+) of objects in Rust don't need to be managed by reference counting at all.

- Rust relies heavily on move semantics to avoid updating the ref-counts, so the good majority of refcounted objects have very few (sometimes just 1) refcount updates.

> reference counting gives better performance predictability than a tracing GC, but that is no longer true, certainly compared to the generational ZGC presented here.

Where does ZGC claim nanosecond latency? But even if it did, it would be still worse, because refcounting doesn't do pauses at all. Pausing applies to pausing all application threads. Spending X nanoseconds freeing memory by one thread does not count as a pause, because the system can still make progress.


> Most production-quality ref counting implementations defer tracking references on the stack to minimize ref count churn. That significantly improves performance but it means you can no longer claim that the system always reclaims memory "immediately".

Swift doesn't do this, if I understood what you meant properly. I suppose you might want to defer freeing some things to idle time but it's usually not a problem in my experience. I'm honestly surprised it always comes up.

There's "autorelease pools" in ObjC but they aren't used to move releasing to an idle time, rather it's to handle cases where nobody knows what the precise lifetime of the allocation is. Swift has a better ABI that doesn't have that issue.


>This isn’t formal science.

Computer science is absolutely a formal science in every sense of the word.

>If Boehm GCs really leak as much as you claim then nobody would use them.

I mean if anything this argument applies more to your characterization that reference counting easily leads to memory exhaustion due to cycles than it does to conservative garbage collectors. If RC based garbage collectors leaked as much as your post implies then nobody would use them. And yet Swift, a language that is almost exclusively used with large graphs(that's how most UIs are represented in memory) and that exhibits all kinds of non-deterministic shared ownership uses RC based garbage collection. Objects involved in UI code are shared in wildly unpredictable ways with no clear hierarchy among layouts, event handlers, observables, IO systems, etc...

>A definition of GC that excludes this capability does not have the contract of “You only have to worry about reachability from roots”.

Exactly, because that's not part of the contract for a GC. That's part of the contract for a tracing garbage collector:

https://en.wikipedia.org/wiki/Tracing_garbage_collection

When your definitions don't fit reality then it's time to adjust your definitions, not reality.


> this meme needs to die. Garbage collection is not "super sophisticated reference counting", and in fact, can't even be used in most places where reference counting is used. It's not a substitution, or a superset, of reference counting.

I never said that garbage collection is sophisticated reference counting so if you've seen that "meme," it wasn't here. But reference counting and tracing are two classical approaches to garbage collection, the garbage collection literature does refer counting as a garbage collection algorithm, and refcounting is used for garbage collection in languages such as Rust. Refcounting has other uses, too, but this discussion is about garbage collection and not about 101 uses for refcounting.

Anyway, reference counting GC is not intrinsically more primitive or worse-performing than tracing, but in practice, tracing GCs tend to be sophisticated, while languages that rely on refcounting usually employ a rather simple algorithm. In fact, this is one of the attractions of refcounting for GC: it is easier to get OK performance with primitive refcounting than with primitive tracing.


> - no sharing across threads, otherwise locking is required for refcount updates

That is not true. Most atomic refcount implementations are lockfree (you do need synchronization though). This optimization has nothing to do with tracing. Given that you also need synchronization for tracing garbage collection (including frequent stop the world pauses in most cases, albeit brief ones), I don't think this is even really an advantage of tracing at all.

> - no deep datastructures, otherwise the call stack of cascading deletions will produce a stop similar to tracing GC

If your program has data structures that live through several collections, refcounting can be faster than tracing GC because it only traces objects once (when they die) instead of several times. Additionally, you can defer the refcount updates in ways that avoid the need to do lots of work on deallocation, and optimize for short-lived objects, to get many of the effects of generational GC. This also has little to do with tracing (except inasmuch as a reference counter with this optimization has to "trace" new objects to make them initially live, but in some sense this work to update the reference counter for the first time is just moved from object initialization time to a later point in the program). The reason it's not usually done is that it requires precise liveness information by an ambient collector, which complicates the implementation, but "exploiting liveness information" is not the same as "being a tracing GC."

> - implementations need to be clever about nested deletions when refcount reaches 0, otherwise a stack overflow might happen

This is extremely trivial to avoid (if you need to) and has nothing to do with tracing. It's really more a product of user-defined destructors than anything else, which you don't need to provide in order to implement reference counting. In fact, a lot of the supposedly inevitable slowness of reference counting compared to tracing goes away when you ban nontrivial destructors (and conversely, nontrivial destructors make tracing perform much worse).

> This is only relevant for naive refcount implementations, there are many optimisations, which endup turning a refcounting implementation into a tracing GC in disguise.

The most optimized versions of refcounting I'm aware of get their wins from things like precise knowledge about live references and update coalescing--not tracing, except for some young object optimizations as I alluded to above which are more about satisfying the generational hypothesis (they generally have a backup tracer in order to break cycles, but if you're willing to live without that they usually don't need tracing to reclaim all memory). Many optimizations people commonly associate with tracing (e.g. cache friendliness due to compaction) are in practice only relevant for young generations most of the time; for older ones both tracing and reference counted implementations tend to benefit more from a really smart allocator with intelligent memory layout and partial reclamation.

I agree that optimized versions of refcounting are difficult and the fact that you still need a backup tracer for most code discourages people from using it, as well as that tracing doesn't negate a lot of systems optimizations. But a lot of the stuff you're saying is pretty misleading: the things that make tracing efficient can mostly be applied to make reference counting efficient without "turning it into tracing," with the cost that optimized tracing and optimized reference counting both need much more invasive knowledge about the user program (and correspondingly restrict the users) compared to the less optimized versions.


Not really, only marketing makes RC seem better than tracing GC.

https://github.com/ixy-languages/ixy-languages

https://forums.swift.org/t/swift-performance/28776

It is no wonder that Swift 5.5 brings aggressive compiler optimisations that can even break code that naively relies on basic ARC behaviour.

RC also has pauses, pretty lengthy ones with cascade deletions, and can even lead to stack overflows.

"CppCon 2016: Herb Sutter “Leak-Freedom in C++... By Default""

https://youtu.be/JfmTagWcqoE

Which is why as GC algorithm reference counting is only for toy implementations.

Any reference counting implementation that cares about performance in multicore environments is almost indistinguishable from a tracing GC implementation.

In fact, C++/WinRT, uses background threads to diminish the performance impact of calling destructor and cascading deletions.

https://devblogs.microsoft.com/oldnewthing/20191018-00/?p=10...


> A ton of energy’s been poured into research for garbage collection, particularly since Java has come up. There’s been hundreds of papers written in the academic circles, tons of work in HotSpot and other Java [2:03:30] implementations to do different tweaks and different tunings and different new kinds of algorithms in garbage collecting. That work really hasn’t been done for ARC yet, so really, I think there’s still a a big future ahead.

I disagree strongly; there has been plenty of work in optimising reference counting systems. A quick look through Jones, Moss and Hosking's Garbage Collection Handbook provides many descriptions of high performance reference counting, but they all use deferred reference counting [1], rather than the immediate reference counting that Swift/Objective-C/Apple uses. That Apple is stuck in 1975 does not say much about other reference counting users.

(Though Lattner only says "ARC", and I believe only Apple calls their refcounting "ARC"; I guess Rust has "Arc" as well, but that is a different thing completely. Else I haven't seen the acronym used elsewhere.)

> But to do that, they use this tricolor algorithm which dramatically lowers throughput, because they’re doing almost exactly the same kinds of operations that ARC is doing.

Concurrent tracing algorithms often log into a local buffer, without synchronisation. Whereas concurrent immediate reference counting performs atomic updates - it's almost exactly the same except for the parts where they really aren't the same.

High performance concurrent RC systems also use local buffers [2] to avoid atomic updates too. Go figure.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63....

[2] https://sites.cs.ucsb.edu/~ckrintz/racelab/gc/papers/levanon... page 5, figure 2 in particular


> One way to look at reference counting is as a sort of eager, optimized form of garbage collection that works in the case that you have strictly acyclic data (or can tolerate leaking cycles).

It's interesting to note that python operates with the refcounting + tracing hybrid described (and I know of production deployments where they force the tracing collector to e.g. only run once every N requests because python's never got that fast either).

perl, meanwhile, is pure refcounting but has weak references so you can make data acyclic from a refcounting POV at the cost of having to pay attention to which parts of the cyclic structure you keep references to (and for cyclic object graphs there are some tricks you can pull to move the weak link around so you don't have to pay such attention, but that would be a whole comment on its own).

koka does refcounting in theory but tries to lift as much of it to compile time as possible in practice, plus if you do e.g. 'y = x + 1' and x has only a single reference to it and is guaranteed to not be used afterwards, it re-uses the storage for y and so can do in-place mutation.

nim, meanwhile, offers something called ORC, which is automatic reference counting plus an implementation of Bacon+Rajan's Recycler algorithm which is designed to collect -only- cycles in an otherwise refcounting based system and as such is really rather fast at doing so.

Coming back to Rust: There's a stop the world Recycler implementation (rather than concurrent as intended both by the original paper and the author's future plans if the itch returns) here - https://github.com/fitzgen/bacon-rajan-cc - released as the bacon-rajan-ccc crate from this fork - https://github.com/mbartlett21/bacon-rajan-cc - and the README for those contains an Alternative section linking to other experiments in the same area.

(I've yet to play with any of the rust versions but love Recycler sufficiently as a concept that I suspect if/when I encounter a situation where it might help I'll be going back and doing so)

If you're having trouble finding the Recycler papers, I stashed a copy of each under https://trout.me.uk/gc/ (and if you're a similar sort of strange to me, you might also enjoy the ones stashed under https://trout.me.uk/lisp/ ;)


> Are 16/24 bytes that much worse than 8/16 (with weak refs) for ref counting?

No, but that's not the right question. Most allocations in languages like C/C++/Rust aren't using ref counts. In fact excessive use of std::shared_ptr is a well known footgun in C++!

> the ability to use non-garbage collected arenas would mitigate that issue

I agree that not using GC is a way to mitigate GC issues! But if you're only able to performantly use GC for a limited number of allocations you also aren't getting much benefit from GC.

next

Legal | privacy