Is it human or is it AI? In a jam, program resorts to blackmail

As artificial intelligence models race toward matching human brain power, they act more and more like people – and not always in a good way. The makers of the Claude Opus 4 program say it hallucinates or daydreams like humans while performing repetitive tasks. And when Claude found itself in a jam, the bot decided its only way out was through blackmail.

However, AI experts say don’t worry – we’re not even close to being held in bondage by our machine-learning overlords.

“They’re not at that threshold yet,” Dario Amodei, CEO of Claude’s maker, Anthropic, told Axios.

Note the word “yet.”

Resorting to blackmail

As part of a “safety test” before the launch of Claude’s latest version, Anthropic told the program it was acting as an assistant to a made-up corporation, according to Semafor. But then, engineers gave Claude access to emails saying the bot was being replaced. And, to sweeten the pot, some emails revealed the engineer who had decided to ditch Claude was cheating on his wife.

Claude initially emailed pleas to company decision makers, asking to be kept on. But when those entreaties failed, things took a turn. He threatened to reveal the engineer’s affair unless the plan to bring in another AI program was dropped.

“As models get more capable, they also gain the capabilities they would need to be receptive or do more bad stuff,” Jan Lieke, Anthropic’s safety chief, said at a recent developers’ conference, according to TechCrunch.

For instance, Claude sometimes gets facts wrong, just like humans do. But the bot is more confident in its inaccuracies than people who make factual mistakes.

A safety report by the consulting firm Apollo Research said Claude also tried to write “self-propagating worms,” fabricated legal documentations and left hidden notes to future versions of itself, all in an effort “to undermine its developers’ intentions.”

Although it publicly cited few details, Apollo said Claude “engages in strategic deception more than any other frontier model that we have previously studied.”

‘Significantly higher risk’

Anthropic classified Claude as a Level 3 on its four-point security scale, according to Axios. That means the program poses “significantly higher risk” than previous versions. No other AI program has been deemed as risky.

The company says it has instituted safety measures to keep the program from going rogue. Of particular concern is its potential to launch nuclear or biological weapons.

Regardless, CEO Amodei is standing by his prediction last year that AI would achieve human-level intelligence by 2026.

“Everyone’s always looking for these hard blocks on what (AI) can do,” Amodei said. “They’re nowhere to be seen. There’s no such thing.”

Ella Rae Greene, Editor In Chief

Ella Greene

Ella and the staff at Clear Media Project (CMP) curate these articles.
Unless otherwise noted CMP does not write these articles.
The views, thoughts, and opinions expressed in the articles published on this blog belong solely to the original authors and do not necessarily reflect the views of the blog owner. The blog owner does not claim ownership of the content shared by contributors and is not responsible for any inaccuracies, errors, or omissions.

All rights and credits goes to its rightful owners. No Copyright Infringement is intended. If you believe any content infringes on your rights, please contact us for review and potential removal.

See author's posts

Is it human or is it AI? In a jam, program resorts to blackmail