Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Sounds like they were hanging onto a pointer to an object allocated by GC. For example, in Python/C API if you use a borrowed reference PyObject* after it has gone out of scope and been GC'd.


sort by: page size:

Probably pinning the object in memory, so that you can keep a stable reference to it without the danger of the GC moving it away under your pointer.

Conservative GCs don't know which bytes in memory are actually pointers, so they treat every word in memory as being a pointer if it looks like one. This means if you have some other value that happens to look like a pointer — in this case a float — the GC will think it's pointing to some other memory and keep that memory around even though it isn't used.

Also, if we're talking about passing pointers to objects to C code, GC frequently moves objects in memory which will invalidate the pointer. It's not possible to reliably predict when this will happen, so you need extra language support for pinning to do any of the things in this tutorial without accessing invalid memory.

Which is like having a C pointer to an object controlled by a GC, and the GC does not know (or care) that you have a reference to one of its objects.

No, you have to verify that your libraries don't keep disguised pointers to objects you allocated via the GC, that you don't yourself keep pointers to. It's quite unusual in C to give a reference to an object to another piece of code then drop all the pointers to it that you hold, because the usual discipline is that the module that allocates an object frees it.

Thanks for the explanation. I didn't realize Go's GC also worked for pointers/references.

Not really. Looking at the doc for that function:

"There is no way to convert the pointer back to its original value. Typically this function is used only for debug information."

(In CPython, you can e.g. stash a PyObject* away in C globals.)


The latter. The GC won’t collect it while the pointer references it.

IIRC they originally were stored inside of the interface, but then the GC would have to check whether they were a value or a pointer and this switching was too costly.

That's not really true, at least not with C.

C doesn't permit arbitrary pointer arithmetic. e.g. If you take a pointer to the first element of an array and decrement it, the behaviour is undefined. It's permissible for the GC to crash your program in that situation. You don't need to dereference it: just having that pointer, even ephemerally, is undefined behaviour.

If you deallocate an object, then all pointers to that object are, technically, rendered as-though uninitialized: it is undefined what happens if you even attempt to determine the address of a previously-deleted object. This permits implementations to set dangling pointers to NULL.

Finally, a pointer to e.g. an int can't be used to access e.g. a float at that location. It is valid to convert an int pointer to a float pointer, but the consequences of dereferencing it are undefined. An exception exists for characters: any pointer can be converted to a char pointer and used to read the bytes of the object.

The upshot is that every pointer in C always points at a valid object of a known type - or just-allocated memory which hasn't had a value put in it yet - or the GC is allowed to crash the program. Of course, most (all?) compilers don't actually generate code where it's always possible to find out what type of object a pointer is pointing at, or even whether the pointer has actually gone out of range.

The real issue in C - other than a lot of code depends on the above rules not being enforced - is that it's valid to convert a pointer to an integer, store it for long enough for an eager GC to remove the object, and then convert that integer back to a pointer. Now you have an invalid pointer, but the standard doesn't allow the pointer to become invalid. That means the only valid GC must assume any integer might be a pointer in disguise (unless it can prove it could never be converted into a pointer), and it can't move objects around either, because it can't just go modifying integers which might be pointers to those objects.


> FWIW my assumption always is that it's "a borrow".

But for how long? Forget about multithreading for a moment, but can my function store the pointer for later use? This is incredibly common in OOP, and dangling pointers/references are one of the (if not the) top sources of bugs in C and C++. And unlike other sources of bugs, there’s no good fundamental remedy against it.


those references aren't real pointers though, the underlying runtime may have copied/moved around the underlying memory behind the scenes

Well, passing raw pointers to garbage collected objects is impossible without language support for "pinning". This is because the GC will move objects around in memory breaking any pointers you're using in your C code. That may be a generous interpretation of the author (it could just be bile), but in the context of the article (about C interoperability) that's what comes to mind.

> can the behaviour of the following bit of C code depend

> on a modern computer, a pointer dereference could very lead to the execution of Python code.

So the BEHAVIOR of the C code does not change. The dereference of the pointer triggers a memory page load, and if that load is successful, a numeric value is returned and added to the array. If the load fails, you will have the undefined result of accessing uninitialized memory.

In both cases, the behaviour of the code remains squarely within the C standard - with the actual result of the computation contingent on various external factors.


Pointers and manual memory management apparently.

> and if you need to point back to the object that owns you

That's not the interesting case; that is indeed easily addressed by "weak" pointers. What's interesting is the case of a spaghetti reference graph where none of the objects definitely "owns" any other, and even the choice of GC roots might be dynamic.


Python already has this 'hell of pointers' in the ctypes module.

Where `to_ptr(x)` is:

  ptr = ctypes.pointer(ctypes.py_object(x))
And `ptr.dereference()` is:

  ptr.contents.value

Not only that but you have a pointer to a parameter returned back and used outside its scope ...

Basically C semantics are to blame, due to the way C was designed, and the liberties it allows its users, it is like programming in Assembly from a tracing GC point of view.

Meaning that without any kind of metadata, the GC has to assume that any kind of value on the stack or global memory segments is a possible pointer, but it cannot be sure about it, it might be just a numeric value that looks like a valid pointer to GC allocated data.

So any algorithm that needs to be certain about the exact data types, before moving the wrong data, is already off the table in regards to C.

See https://hboehm.info/gc/ for more info, including the references.

next

Legal | privacy