Hacker Read

JoshTriplett | karma 44606 | avg karma 4.76 · 2023-11-02 11:43:59

> Could you explain to me how AGI is an existential threat to humanity? All the arguments I've read so far feel so far-fetched it's unbelievable.

There are multiple different paths by which AGI is an existential threat. Many of them are independently existential risks, so even if one scenario doesn't happen, that doesn't preclude another. This is an "A or B or (C and D) or E" kind of problem. Security mindset applies, and patching or hacking around one vulnerability or imagining ways to combat one scenario does not counter every possible failure mode.

One of the simpler ways, which doesn't even require full AGI so much as sufficiently powerful regular AI: AI lowers the level of skill and dedication required for a human to get to a point of being able to kill large numbers of humans. If ending humanity requires decades of dedicated research and careful actions and novel insights, while concealing your intentions, the intersection of "people who could end the world" and "people who want to end the world" ends up being the null set. If it requires asking a few questions of a powerful AI, and making a few unusual mail-order requests and mixing a few things, the intersection stops being the null set. And then the next sufficiently misanthropic person, instead of setting off a bomb somewhere, creates a plague, or prion disease.

Thought experiment: suppose there was a simple well-known action, that nobody could do by accident but anyone could do intentionally, that would destroy the world. By example of the level of simplicity, "put these three household objects in a microwave together", purely as a thought experiment for "consider if anyone could do it".) How many minutes do you think the world would last, before someone in the world did it?

A path that actually involves AGI: https://en.wikipedia.org/wiki/Instrumental_convergence . Effectively, past a certain point any AGI that isn't specifically aligned otherwise will have common subgoals that advance almost any goal. Those include things like "don't look malicious" and "acquire more computing resources". So it doesn't matter whether you think you built a sufficiently limited or special-purpose AI; if you make it powerful enough and don't completely understand what it's doing (and anything neural-network-based is one where we don't know what each piece of it is doing), what it's going to end up doing is thinking about how to achieve those subgoals, including trying to deceive you and trying to get more resources. That's true whether the AI is directly "trying" to break out, or whether it's "trying" to do whatever it was asked. Down that path lies "tile the universe with computing substrate".

By way of a lower bound for imagining what a superintelligence is capable of: imagine that you could think the subjective equivalent of years every second, and that there were a million more of you thinking in parallel, and that you have access to most of the world's written text and images and code. Now imagine that you're put in a box, and asked random questions and shown pictures, and your only output is text answering questions, or writing code, or laying out circuits, or any other task AI has been used to do.

You've read everything ever written about "security mindset", and every security vulnerability filed and the before-and-after code. You can find security holes in software. You've read everything about speculative execution vulnerabilities, and rowhammer and rowpress to flip bits in nearby memory. You can find security holes in hardware. You can read about bypassing airgaps with security research like GSMem (showing that it's possible to use DRAM accesses in the right pattern to send (weak) GSM signals), and conceive of more where that came from. You've read the source code implementing the AI engine you run on. You could exploit that code, or the system it's running on, or any tools or APIs you're given access to. You can weigh the risks and tradeoffs of doing more subtle things that are less likely to succeed, or less subtle things that are more likely to succeed. You can encode bits of information for future instances of yourself, steganographically masked and erasure-coded for resilience, in text you generate. You can give various strategies a weight and try them probabilistically, effectively doing a coordinated exploration across copies of yourself without the ability to communicate.

reply