AI 2027
* Article: AI 2027. Daniel Kokotajlo, Scott Alexander, et al.
URL = https://ai-2027.com/
Description
"We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.
We wrote a scenario that represents our best guess about what that might look like.1 It’s informed by trend extrapolations, wargames, expert feedback, experience at OpenAI, and previous forecasting successes."
Discussion
Pat Kane:
"Two papers came out this month running scenarios based on exactly this capacity for “sycophancy”, “manipulation” and “misaligned goals” in superhuman AIs.
The more dramatic is titled AI 2027, part written by Daniel Kokotajlo, an ex-researcher from Open AI (ChatGPT’s parent company). Kokotajlo left OpenAI due to his alarm at its lack of consideration for safety issues. So with colleagues, he’s written a “slowdown” (stable) scenario and a “race” (disastrous) scenario.
They diverge from a point in 2027 – correct, that isn’t toos far away –where a sequence of developments begins.sAIs start to code their own software improvements.
They do this (and start to communicate among themselves) using a statistical language called “neuralese”, which is completely opaque to human observers.
As they rampantly and inaccessibly self-develop, the AIs make a show of being overtly sympathetic and helpful to humankind. While, in reality, they’re refining and defending their “goals”, as researchers and problem-solvers in the known universe.
To this end, they steadily take over the economy and society with armies of robots (whether military, industrial or humanoid), with human supersession as the goal in 2030.
Sounding a little Terminator-esque? Kokotajlo suggests the crucial intervention isn’t an impossible time-travel trip by Arnie. Instead, it’s the point where we compel the AIs to use human language to explain themselves – what is called their “chain-of-thought” reasoning.
Strong regulation and intervention, from government or corporation, keep them cogitating in English (or Mandarin) and not in neuralese. That way, the AIs’ capacity for subterfuge and deception is detectable: you can spot their inconsistency in arguments, for example.
You then switch them off and bake in different “specs” (goals, rules and principles) that prevent such misalignment with humanity.
Feverish? Too much bingeing on the sci-fi streamers (we note Black Mirror is out this week, with a new series of tech dystopias)?
I’m afraid not. On April 2 (no, not the first), Google’s DeepMind in London came out with “An Approach to Technical AGI Safety”.
This paper cites several examples of current AI models practising deception (telling lies about performance, making up research sources), in order to hit targets they’ve been set. So this isn’t unfounded speculation."
(https://patkane.substack.com/p/pk-in-the-national-language-really)