> But to ask a pointed question, doesn't that mean Nim gets the worst of both worlds? You have both the overhead of updating reference counts and the (relatively long) garbage collection pauses.
There's a patch of the collector to make the M&S incremental, to avoid pauses for this step too. Of course, whether a deferred RC'ing GC with incremental cycle collector works better in practice than a more conventional generational incremental GC (or just an incremental GC) is an open question. :-)
Nim has garbage collected pointers (ref) and raw pointers (ptr). You can use 'ptr' to break up cycles manually and disable the M&S. I suppose it's very much comparable to Swift except that Nim's RC overhead is much lower since it's deferred RC.
That has been my experience too. That all the extra work and logic (cause the algorithm is complicated) you need for detecting cycles and trial deletion is so expensive that regular mark&sweep beats it.
But to ask a pointed question, doesn't that mean Nim gets the worst of both worlds? You have both the overhead of updating reference counts and the (relatively long) garbage collection pauses. I guess if the programmer codes in such a way that no cyclic garbage is created it is not a problem because the gc will never be triggered. But how common is that in practice? Does the language make it easy to avoid cycles?
By naive I refer to RC implementations that simply instrument pointer assignments with the necessary reference counting instructions. Non-naive RC implementations try to minimize the overhead associated with it; deferred reference counting or having the compiler optimize away unnecessary reference count updates are typical approaches.
Erlang and Dart have tracing GCs; I was referring to their concurrency model (which uses thread-local heaps), not their method of garbage collection. Nim uses either deferred reference counting or a simple mark-and-sweep collector, depending on a compile time switch, but also uses thread-local heaps. The main attraction of Nim's deferred RC collector is that it can keep pause times very low; I haven't actually benchmarked its amortized performance, but I expect it should be competitive with or better than the Boehm GC, especially for large heaps.
> I wouldn't consider reference counted GC systems level appropriate. It _can_ be more deterministic, but not when it's tracing and collecting cycles (which a decent modern RC implementation must do)
Swift has what you would call an ‘indecent’ RC design, then, because it doesn’t.
> and it usually plays havoc on L1 cache (by touching counts).
Swift (like Objective-C) can store the refcounts either inline or in supplementary structures, for performance.
> Now if the garbage collection was optional
In the common case, it practically is, as (most) value types will only ever live on the stack, or inline in another data structure.
Nim's new garbage collector was designed with embedded systems in mind [1]. It uses non-atomic/non-locking reference counting combined with C++11 move semantics. The usage of move semantics enables both performance increases and a method to ensure memory is only "owned" by one thread (which enables non-atomic reference counting). Theres optional cycle collection support but I prefer to keep embedded code acyclical.
On your second point you can use Nim's effect system to ensure no allocations occur in a given function or its callees. Currently no one has done this, and might be tricky with the reference counting, but not too difficult either.
I used Nim with ARC for a latency sensitive embedded project. To avoid allocation on the tight reading loop I pre-allocated the values needed, or made them global variables. A few extra integer ops didn't matter, but allocation is sloooow. However, I'd eventually like to add allocation tracking to the effect system. Rather, make the existing effects more granular for this purpose.
Yes, with some relatively minor caveats, mostly having to release memory yourself.
Also, the arc/orc GC is shaping up and already allows you to use exclusively reference counted memory management - so, efficient and perfectly deterministic timing but still get automatic memory management. (As usual, if you introduce cycles, it becomes more complicated)
And the Nim compiler elides many ref/unref ops, as well as keeping objects thread-local, so most performance objections to ref counting don’t actually apply. (.... and you have a choice of other automatic GC modes, including “none”)
> A pretty naive question: what is the advantage of garbage collection over just reference counting?
It's not that naive, and the unpopular answer is "not much". A good modern GC, faced with a usage paradigm that it expects (this part is important!), is almost certainly going to be faster. It's indeed safe from circular reference leaks. Generational collectors can often compact memory better and have a smaller footprint for the same data.
Those are all real advantages. But they come at the cost of outrageous complexity and code footprint. A reference counter is probably more than an 80% solution for almost any reasonable application, and it's something college kids write in 15-20 lines in their intro course.
> Briefly looking at the article I think the authors opinion on this matter is pretty clear that an additional algorithm such as a cycle collector is required to be considered GC.
That is almost certainly not what David Bacon thinks. I don't know how you could get that idea from reading the introduction. Reference counting is a form of garbage collection, period.
C++, Objective-C, etc. started muddying the waters in the '90s and 2000s and got people thinking that somehow reference counting isn't GC and therefore doesn't have the downsides of automatic memory management. But the literature is clear on this point.
> A ton of energy’s been poured into research for garbage collection, particularly since Java has come up. There’s been hundreds of papers written in the academic circles, tons of work in HotSpot and other Java [2:03:30] implementations to do different tweaks and different tunings and different new kinds of algorithms in garbage collecting. That work really hasn’t been done for ARC yet, so really, I think there’s still a a big future ahead.
I disagree strongly; there has been plenty of work in optimising reference counting systems. A quick look through Jones, Moss and Hosking's Garbage Collection Handbook provides many descriptions of high performance reference counting, but they all use deferred reference counting [1], rather than the immediate reference counting that Swift/Objective-C/Apple uses. That Apple is stuck in 1975 does not say much about other reference counting users.
(Though Lattner only says "ARC", and I believe only Apple calls their refcounting "ARC"; I guess Rust has "Arc" as well, but that is a different thing completely. Else I haven't seen the acronym used elsewhere.)
> But to do that, they use this tricolor algorithm which dramatically lowers throughput, because they’re doing almost exactly the same kinds of operations that ARC is doing.
Concurrent tracing algorithms often log into a local buffer, without synchronisation. Whereas concurrent immediate reference counting performs atomic updates - it's almost exactly the same except for the parts where they really aren't the same.
High performance concurrent RC systems also use local buffers [2] to avoid atomic updates too. Go figure.
> People recommend tracing garbage collection because it performs better than atomic reference counting, even though reference counting is much simpler.
Reference counting is a lot more complicated! It turns every read into a write.
> [...] if my goal is performance and predictable latency?
Reference counters don't have predictable latency: a single reference decrement can lead to an arbitrary amount of follow up work, like more decrements and deallocations.
Performance of reference counting isn't generally better than that of garbage collectors. (Especially since you still need a backup solution to deal with cycles.)
Figuring out a lot of your memory management at compile time is definitely a win. But doesn't really have much to do with reference counting.
The nim team is currently working on removing the garbage collector by means of a new reference counting based “garbage collector mode” called “arc” (for automatic reference counting). You can get more info in the following link, where it is described as “plain old reference counting with optimizations thanks to move semantics”:
The objective is to make nim suitable for embedded programming and other use cases for which garbage collection is a non starter.
This new —gc:arc mode is already available in the nightly builds and the benchmarks are already impressive. I believe that the plan is to make arc the default “garbage collector mode” in nim 1.2.
> Perhaps the biggest misconception about reference counting is that people believe it avoids GC pauses. That's not true.
When I can easily replace the deallocator (thus excluding most non-RC production GCs), I can (re)write the code to avoid GC pauses (e.g. by amortizing deallocation, destructors, etc. over several frames - perhaps in a way that returning ownership of some type and its allocations to the type's originating thread, and thus reducing contention while I'm at it!) I have done this a few times. By "coincidence", garbage generation storms causing noticable delays are suprisingly uncommon IME.
As programs scale up and consume more memory, "live data" outscales "garbage" - clever generational optimizations aside, I'd argue the former gets expensive more quickly, and is harder to mitigate.
It's also been my experience that tracing or profiling any 'ole bog standard refcounted system to find performance problems is way more easy and straightforward than dealing with the utter vulgarity of deferred, ambiguously scheduled, likely on a different thread, frequently opaque garbage collection - as found in non-refcounted garbage collection systems.
So, at best, you're technically correct here - which, to be fair, is the best kind of correct. But in practice, I think it's no coincidence that refcounting systems tend to automatically and implicitly amortize their costs and avoid GC storms in just about every workload I've ever touched, and at bare minimum, reference counting avoids GC pauses... in the code I've written... by allowing me easier opportunities to fix them when they do occur. Indirectly causal rather than directly causal.
> if you read a heap-object out of a data structure, you have to increment the object's reference count.
This isn't universal. Merely accessing and dereferencing a shared_ptr in C++ won't touch the refcount, for example - you need to copy the shared_ptr to cause that. Rust's Arc/Rc need to be clone()d to touch the refcount, and the borrow checker reduces much of the need to do such a thing defensively, "in case the heap object is removed out from under me".
Of course, it can be a problem if you bake refcounting directly into language semantics and constantly churn refcounts for basic stack variables while failing to optimize said churn away. There's a reason why many GCed languages don't use reference counting to optimize the common "no cycles" case, after all - often, someone tried it out as an obvious and low hanging "optimization", and found it was a pessimization that made overall performance worse!
And even without being baked into the language, there are of course niches where heavy manipulation of long-term storage of references will be a thing, or cases where the garbage collected version can become lock-free in a context where such things actually matter - so I'll 100% agree with you on this:
> There are no hard lines; this is about performance tradeoffs, and always will be.
> Note also that you can do cycle detection concurrently with the mutator's operation (again, see [1]). Concurrent cycle detection makes the implementation more complicated, but no more so than a concurrent tracing scheme.
I disagree—it's much more complicated. Concurrent GC is not trivial, but it doesn't have seven different colors and two different tests (sigma and delta test).
A good concurrent reference counting system is just as complex as a good tracing garbage collector, because it requires the same runtime machinery—precise stack maps, write barriers, etc., to perform backup tracing. But it also has the reference counts to deal with. It ends up being more complex overall.
> Moreover, if your program is indeed performance-critical, RC schemes allow you to minimize the impact on cycle detection by designing your code so as to avoid cycles (either by using acyclic data structures exclusively or by using weak pointers).
No, that's my point—even if you use acyclic data structures, there is no guarantee that you won't have the CC run because you had multiple references to an object and you dropped one, making the cycle collector suspect it was part of a cycle. This is the entire reason why ForgetSkippable was added to Firefox: this happened a lot in practice. The solution was to add a bunch of ad-hoc code to make the cycle collector drop objects from the purple buffer, much like the "acyclic" pragma does in Nimrod—but now you have the possibility of leaks unless this is done very carefully.
> Conversely, tracing schemes make it difficult to programmatically influence worst case pause time.
In collectors like Metronome, it's extremely easy: set the pause time to what you want it to be. That's the only solution that's really scalable in my view: working around the collector by manually telling the CC not to scan certain types feels too error-prone.
> not every AMM technique works by producing garbage then collecting it
And the confusing thing is that garbage collection (GC) doesn’t collect garbage, while reference counting (RC) does.
GC doesn’t look at every object to decide whether it’s garbage (how would it determine nothing points at it?); it collects the live objects, then discards all the non-live objects as garbage.
RC determines an object has become garbage when its reference count goes to zero, and then collects it.
That difference also is one way GC can be better than RC: if there are L live objects and G garbage objects, GC has to visit L objects, and RC G. Even if GC spends more time per object visited than RC, it still can come out faster if L « G.
That also means that GC gets faster if you give it more memory so that it runs with a smaller L/G ratio (the large difference in speed in modern memory cache hierarchies makes that not quite true, but I think it still holds, ballpark)
> We don't want Garbage Collection to the contrary we want reference counting. Reference counting is the best compromise between handling memory manually and using GC ala Java. I would argue that the mental strain for the programmer is equal for reference counting(at least with ARC) to that of GC. However Reference Counting is extremely lightweight.
I used to think that, until I started to spend more time on modern GC platforms like Java and .NET. It's impressive how performant a good generational garbage collector can be. For many business applications (i.e., not the kinds of workloads you'll see being simulated by popular benchmarks) a good generational GC can even improve performance thanks to improved locality of reference. I think a lot of us don't fully appreciate the cost of a cache miss.
Speaking of, I'm not sure a lot of us fully appreciate the cost of synchronization, either. It's one big reason why I'm not convinced that reference counting is particularly well-suited to concurrent programming, since it turns a really commonplace operation (retaining a reference to an object) into a synchronization point in the program. There's a non-trivial cost associated with that. You could avoid repeatedly stalling the CPU by being very careful about not sharing objects among threads, but I don't think most developers can reasonably be expected to make a habit of that in practice.
> The basic idea is that not only is reference counting a form of garbage collection..
Oh, how this leads to debate. The confusion is whether having automatic reference counting without a cycle detector is considered GC by the masses (what language publicizes a GC and does this?), and if people question GC to be a form of automatic memory management, where other forms may exist.
Briefly looking at the article I think the authors opinion on this matter is pretty clear that an additional algorithm such as a cycle collector is required to be considered GC.
edit: "Reference counting" itself also isn't even necessarily 'automatic'. Just find it odd the way this gets phrased. GC can use reference counting, not the other way around..
> this meme needs to die. Garbage collection is not "super sophisticated reference counting", and in fact, can't even be used in most places where reference counting is used. It's not a substitution, or a superset, of reference counting.
I never said that garbage collection is sophisticated reference counting so if you've seen that "meme," it wasn't here. But reference counting and tracing are two classical approaches to garbage collection, the garbage collection literature does refer counting as a garbage collection algorithm, and refcounting is used for garbage collection in languages such as Rust. Refcounting has other uses, too, but this discussion is about garbage collection and not about 101 uses for refcounting.
Anyway, reference counting GC is not intrinsically more primitive or worse-performing than tracing, but in practice, tracing GCs tend to be sophisticated, while languages that rely on refcounting usually employ a rather simple algorithm. In fact, this is one of the attractions of refcounting for GC: it is easier to get OK performance with primitive refcounting than with primitive tracing.
> instruction pollution from all the refcount updates and their synchronization
If the language supports ownership tracking and non-synchronized Rc<> as in Rust, refcount updates ought to be rare and/or quick. I agree that this is very much an issue in languages with obligate "managed" memory such as Swift, and that tracing GC may sometimes be preferable to that.
> too much time spent freeing memory
If you're using Rc, you probably care about memory reclaim being deterministic, which tracing GC doesn't give you. You can also use arenas to free a bunch of objects all at once; this also addresses memory fragmentation and slow allocation to some extent.
> Reference counting is a form of GC, the difference being that the programmer has full control over when it runs. Most real time systems preallocate everything and never deallocate.
I really don't think Swift is suitable for real time embedded systems, if that's what you suggest. Even C++ is often unsuitable. Rust might have potential one day, if not already.
> One typical problem with GC systems is common patterns and library code cause allocations, which cause the GC to run - consuming cycles and causing jitter even if nothing is freed.
Any library code can allocate or not allocate regardless of language or memory allocation system.
> (Note that even calling malloc() is non-deterministic and best avoided in real time code.)
> The current standard languages are Objective-C and Swift, both with automatic reference counting (which is mostly a GC, although you have to manually break cycles).
I don't want nitpick, but isn't there a _world_ of difference between ARC and your standard GC? In a GC I expect various different "spaces" for different generations of memory allocation and an assortment of algorithms for detecting unreferenced memory at various times. "Stop the world" events etc.
I may be wrong, but I always thought of ARC as a somewhat glorified pointer structure where each allocated memory got a counter of how many places are currently pointing to it, and when that counter reaches 0, deallocation happens. No fancy generation spaces, no complicated cycle finding algorithms and no stopping the world.
There's a patch of the collector to make the M&S incremental, to avoid pauses for this step too. Of course, whether a deferred RC'ing GC with incremental cycle collector works better in practice than a more conventional generational incremental GC (or just an incremental GC) is an open question. :-)
Nim has garbage collected pointers (ref) and raw pointers (ptr). You can use 'ptr' to break up cycles manually and disable the M&S. I suppose it's very much comparable to Swift except that Nim's RC overhead is much lower since it's deferred RC.
reply