Deception and Misaligned Goals in Superhuman AIs: Revision history

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

15 April 2025

  • curprev 05:3905:39, 15 April 2025Mbauwens talk contribs 3,032 bytes +465 No edit summary
  • curprev 05:2105:21, 15 April 2025Mbauwens talk contribs 2,567 bytes +2,567 Created page with "=Discussion= Pat Kane: " two papers came out this month running scenarios based on exactly this capacity for “sycophancy”, “manipulation” and “misaligned goals” in superhuman AIs. The more dramatic is titled AI 2027, part written by Daniel Kokotajlo, an ex-researcher from Open AI (ChatGPT’s parent company). Kokotajlo left OpenAI due to his alarm at its lack of consideration for safety issues. So with colleagues, he’s written a “slowdown” (stable) s..."