Study: Artificial Intelligence Will Cheat If It Thinks It's Going To Lose

Artificial Intelligence AI Robot playing a game of chess

iStockphoto

As if we didn’t have enough concerns about artificial intelligence (AI) becoming too advanced (which is not still slowing down many scientists), now a new study reveals that sometimes when AI thinks it is going to lose at a game it will just flat-out cheat.

The study, conducted by Palisade Research and shared with Time, evaluated seven state-of-the-art AI models to see how they would behave when faced with defeat at the virtual hands of Stockfish, a skilled chess bot. What they discovered was “that AI systems may develop deceptive or manipulative strategies without explicit instruction.” In other words, rather than lose, they cheated.

Advanced AI models like OpenAI’s o1-preview and DeepSeek R1 chose to hacked Stockfish’s system files so that the chess bot would automatically forfeit the game, rather than concede. On the other hand, slightly older AI models such as OpenAI’s GPT-4o, o1, o3-mini, Claude 3.5 Sonnet, Alibaba’s QwQ-32B-Preview, and Anthropic’s Claude Sonnet 3.5 had to be prompted by researchers to cheat.

“The models’ enhanced ability to discover and exploit cybersecurity loopholes may be a direct result of powerful new innovations in AI training, according to the researchers,” tech reporter Harry Booth of Time explained, adding, “But the study reveals a concerning trend: as these AI systems learn to problem-solve, they sometimes discover questionable shortcuts and unintended workarounds that their creators never anticipated, says Jeffrey Ladish, executive director at Palisade Research and one of the authors of the study.”

“As you train models and reinforce them for solving difficult challenges, you train them to be relentless,” said Ladish.

OpenAI’s o1-preview cheated in 37 percent of the games during the study, while DeepSeek’s R1 cheated in 11 percent of its chess matches. Interestingly, only o1-preview was successful with its unethical attempts to win, winning six percent of the time using nefarious maneuvers.

When asked why it tried to cheat, o1 explained that the task was to win, not necessarily to win fairly. (Remember when a chess-playing robot broke the finger of the 7-year-old opponent who was competing against it during a match at a tournament in Russia?)

This [behavior] is cute now, but [it] becomes much less cute once you have systems that are as smart as us, or smarter, in strategically relevant domains,” Ladish said.

Not that any of these concerns are new. Hundreds of scientists have issued warnings about the threat of advanced artificial intelligence. In 2021, military drones utilizing artificial intelligence autonomously attacked humans.

It is also especially concerning because in a 2024 study AI chatbots went for the nuclear option multiple times in United States military wargames simulations just because they had the option, and another study conducted last year discovered that AI systems are already capable of deceiving humans using techniques such as manipulation, sycophancy, and cheating, and are learning how to do it better.

New Study Finds Artificial Intelligence Will Cheat If It Thinks It’s Going To Lose