A year ago I needed a min-heap to build a priority queue at work.
So first I grabbed 'heap' from npm (272k weekly downloads) and set it to work. But a few days later I realized my code was executing slower than expected because it sometimes needed to clone the data structure, and the clone instantiation would break the heap invariant in the array internals. It turned out there's been an issue open about this since early 2017.
Then I went for the 'collections' package (35k weekly downloads) and brought in its heap implementation. That worked like a charm for about six months until a bug came in that made it seem like a completely different package was breaking. After almost a whole day of debugging, it turns out that 'collections' silently shims the global Array.from function (scream emoji) without mimicking its behavior when dealing with non-Array iterables presented by the other package.
So finally I wrote my own heap -- well, I cribbed from Eloquent JavaScript [0] but I did have to briefly remember a little bit about how they're supposed to work. So while I don't totally buy the "Never..." rule in the post title, thinking more carefully about writing versus importing a dependency would have saved me a great deal of headache in this case.
From what I can tell the issue stems from the fact that Javascript puts stack frames on the heap, which means the heap will grow with every function call, which they're doing every 50 ms.
This data structure abstraction can leak if you're not careful with how your implementation matches the intended usage.
I remember ASP.NET used a weird collection class to store javascript snippets that were to be included in the page being generated. The class internally used an array and then switched to a hashtable once the number of items in the collection went beyond some pre-tuned limit.
As a result, if you tried to register too many javascript blocks to be included on the page, you found that they got included out of order, because the internal storage was now a hashtable with a non-deterministic order for keys.
+1. In fact even the JavaScript heap isn't limited to 4gb, just the space that pointers can point to. For example, we allocate large typed arrays outside of the V8 heap.
Although it is necessary in Java, it is entirely pointless to
specify the length of an array ahead of time in JavaScript. [...]
Rather, you can just set up an empty array and allow it to grow as
you fill it in. Not only is the code shorter, but it runs faster too.
Faster? That ought to raise suspicion. JS's dynamic hash-arrays are neat, but now they're supposed to be immune from the laws that govern memory allocation in any other language?
As it happens, I had occasion to test this a few months ago.
function preallocate(len) {
var arr = new Array(len);
for (var n = 0; n < len; n += 1) {
arr[n] = n;
};
return arr;
}
function noPreallocate(len) {
var arr = [];
for (var n = 0; n < len; n += 1) {
arr[n] = n;
};
return arr;
}
On my machine, noPreallocate is 4% faster in FF, but it's 15% slower in IE8 and a whopping 70% slower in Chrome.
Thanks a lot. What did you use to determine this? I am using Chrome inspector's heap allocation profiler and not noticing this difference. We're very interested in fixing this!
function f() {
var some = [];
while(some.length < 1e6) {
some.push(some.length);
}
function g() { some; }
return function() {};
}
var a = [];
var interval = setInterval(function() {
var len = a.push(f());
if(len >= 500) {
clearInterval(interval);
}
}, 10);
uses a very large amount of memory in all browsers.
function f() {
var some = [];
while(some.length < 1e6) {
some.push(some.length);
}
function g() { some; }
}
var a = [];
var interval = setInterval(function() {
var len = a.push(f());
if(len >= 500) {
clearInterval(interval);
}
}, 10);
Well, this is useful! I recently built BLeak [0], an automatic memory leak debugger for the client-side of web apps, which consumes heap snapshots during the automatic leak debugging process.
I had to work around the DOM limitations of V8 heap snapshots by building a JavaScript 'mirror' of DOM state that I could examine in the snapshots [1]. Perhaps I'll be able to strip out that logic and rely on the improved snapshots!
At the same time, all this copying leads to an immense amount of garbage, which can really slow apps down with GC pauses. I really wished JavaScript had support for true immutable structures (a la Immutable.js) since these things do add up.
In my side project, which is a high performance web app, I was able to get an extra ~20fps by virtually removing all garbage created each frame. And there's a lot of ways to accidentally create garbage.
Prime example is the Iterator protocol, which creates a new object with two keys for every step of the iteration. Changing one for loop from for...of back to old-style made GC pauses happen about half as much. But you can't iterate over a Map or Set without the Iterator protocol, so now all my data structures are hand-built, or simply Arrays.
I would like to see new language features be designed with a GC cost model that isn't "GC is free!" But I doubt that JavaScript is designed for me and my sensibilities....
This is a V8 problem, not a JavaScript problem. Any implementation of any garbage-collected language with higher-order functions could have this issue.
When JavaScript is parsed by jsc, it’s compiled to a cachable bytecode, this includes gc allocated constants for constant expressions - numbers, strings, and the underlying storage for some complex types like arrays and regexps.
The cachable code is kept in a big function source->bytecode hash table.
The cacheable bytecode is then “linked” to a bytecode that is optimized for execution speed. That bytecode has things like create_array, etc that can take the immutable backing store object we generated earlier.
This means that if you have multiple functions with identical source code you only have to do the expensive parse+compile step once, and you end up with multiple independent functions using the same constant/immutable backing stores. This saves memory and helps performance.
Unfortunately it adds complexity - now the mutable is objects have “immutable” backing stores, so you have to implement copy-on-write semantics in order to not share state between different instances of the linked code. In this case it appears that a required CoW check was missing :(
You can tell me what your think the behavior is actually like, but I can point you to a HTML5 game that uses 50-100% more JS heap and runs slower in 64-bit builds of Firefox :)
You say a larger heap has marginal impact on performance and that locality and working set size are what matter. If larger pointers make the heap bigger, then the working set is going to grow as well, and locality will be worse because the larger pointers mean less objects fit into a cache line.
> It seems to me that a smart JS engine could avoid this by noticing that there's no possibility of using element within the first closure, but I'm not an expert.
They don't even need that. Any garbage collection strategy (including refcounting) have no trouble reclaiming cycle (the only GC type which may have problems are refcounting GCs, and they usually have a cycle detector for exactly that purpose).
The only way for this issue to arise is to involve two runtimes, each with its own garbage collector, able to create references between one another.
A cycle crossing the boundary will not be detectable by either GC, and will lead to a leak.
This is exactly what happens in Internet Explorer up to (and including) version 7: JavaScript lives in the jscript engine and the DOM lives in COM, each has its own GC, and they can have references to objects living in the other runtime.
Javascript strings are immutable and literals are almost certainly preallocated and stored in a table of constants.
Why should using a string keys cause heap allocations?
Also, most of the Javascript high performance programming is about avoiding garbage collector as much as possible, I wish they just let us manage memory manually.
So first I grabbed 'heap' from npm (272k weekly downloads) and set it to work. But a few days later I realized my code was executing slower than expected because it sometimes needed to clone the data structure, and the clone instantiation would break the heap invariant in the array internals. It turned out there's been an issue open about this since early 2017.
Then I went for the 'collections' package (35k weekly downloads) and brought in its heap implementation. That worked like a charm for about six months until a bug came in that made it seem like a completely different package was breaking. After almost a whole day of debugging, it turns out that 'collections' silently shims the global Array.from function (scream emoji) without mimicking its behavior when dealing with non-Array iterables presented by the other package.
So finally I wrote my own heap -- well, I cribbed from Eloquent JavaScript [0] but I did have to briefly remember a little bit about how they're supposed to work. So while I don't totally buy the "Never..." rule in the post title, thinking more carefully about writing versus importing a dependency would have saved me a great deal of headache in this case.
[0] https://eloquentjavascript.net/1st_edition/appendix2.html
reply