Is it human or is it AI? In a jam, program resorts to blackmail

0
Is it human or is it AI? In a jam, program resorts to blackmail

As artificial intelligence models race toward matching human brain power, they act more and more like people – and not always in a good way. The makers of the Claude Opus 4 program say it hallucinates or daydreams like humans while performing repetitive tasks. And when Claude found itself in a jam, the bot decided its only way out was through blackmail.

However, AI experts say don’t worry – we’re not even close to being held in bondage by our machine-learning overlords.

“They’re not at that threshold yet,” Dario Amodei, CEO of Claude’s maker, Anthropic, told Axios.

Note the word “yet.”

Unbiased. Straight Facts.TM

Artificial intelligence firm Anthropic classified its new Claude Opus 4 model at Level 3 on its four-point safety scale. Level 4 is the most likely to create harm.

Resorting to blackmail

As part of a “safety test” before the launch of Claude’s latest version, Anthropic told the program it was acting as an assistant to a made-up corporation, according to Semafor. But then, engineers gave Claude access to emails saying the bot was being replaced. And, to sweeten the pot, some emails revealed the engineer who had decided to ditch Claude was cheating on his wife.

Claude initially emailed pleas to company decision makers, asking to be kept on. But when those entreaties failed, things took a turn. He threatened to reveal the engineer’s affair unless the plan to bring in another AI program was dropped.

“As models get more capable, they also gain the capabilities they would need to be receptive or do more bad stuff,” Jan Lieke, Anthropic’s safety chief, said at a recent developers’ conference, according to TechCrunch.

For instance, Claude sometimes gets facts wrong, just like humans do. But the bot is more confident in its inaccuracies than people who make factual mistakes.

A safety report by the consulting firm Apollo Research said Claude also tried to write “self-propagating worms,” fabricated legal documentations and left hidden notes to future versions of itself, all in an effort “to undermine its developers’ intentions.”

Although it publicly cited few details, Apollo said Claude “engages in strategic deception more than any other frontier model that we have previously studied.”

‘Significantly higher risk’

Anthropic classified Claude as a Level 3 on its four-point security scale, according to Axios. That means the program poses “significantly higher risk” than previous versions. No other AI program has been deemed as risky.

The company says it has instituted safety measures to keep the program from going rogue. Of particular concern is its potential to launch nuclear or biological weapons.

Regardless, CEO Amodei is standing by his prediction last year that AI would achieve human-level intelligence by 2026.

“Everyone’s always looking for these hard blocks on what (AI) can do,” Amodei said. “They’re nowhere to be seen. There’s no such thing.”

Ella Rae Greene, Editor In Chief

Leave a Reply

Your email address will not be published. Required fields are marked *