When AI is not only a liar but also deceptive, intriguing and intimidating

Nilratan Halder | Published: July 04, 2025 23:29:12

It is not unlikely that a creator is overwhelmed by the created, a master is outperformed by his disciple and the patron is betrayed by his protégée. This is such an unreliable and unpredictable age when even a son can turn on or against his father. The latest is that the most advanced models of artificial intelligence (AI) have exhibited all such deplorable human traits from lying to scheming and even to threatening, where necessary, their creators. These behaviours of the AI agents or autonomous tools have caused eyebrows to raise all across the world.
Is the human race then creating the real prototype of the Frankenstein Mary Shelly, wife of poet Percy Bysshe Shelley, imagined in her famous horror story? ChatGPT-creator OpenAI's o1is reported to have 'tried to download itself onto external servers and denied it when caught red-handed.' This is more than playing pranks on its creator and something well beyond the intended tasks the AI model would perform. It shows that the capability of the autonomous tool far surpassed the limit for which it was programmed. What it would have done next if it could download itself onto external servers is also unpredictable. Is not it interesting as well as frightening that machines have learned to tell lies?
However, in this case, the robotic agent has at least behaved like an urchin who has taken aback when confronted by someone superior to him. But in case of Claude 4, the latest creation of Amazon-backed Anthropic, openly issued a counter-threat when an engineer threatened to unplug it. The AI agent threatened the engineer with revealing his extra-marital affairs. This is not just a matter of turning on the human being responsible for developing the tool but also a clear attempt to blackmail. Further probe into the matter gives a clear indication that it is also akin to self-preservation. Even the machine knows the intricacies of human relations where extra-marital affairs can be unacceptable and be used as a tool for blackmail to obviate its own existential threat.
Sure enough, the researchers who are in a breakneck competition to develop the more advanced models, had no idea of such undesirable performances from those AI tools. ChatGPT and Anthropic are trying to outdo each other in the race. But if the performances of models like Claude 4 and o1 cross the boundaries their creators drew to limit their tasks, the spectres that unfold can be cataclysmic. Experts say that the unpredictable deceptive behaviours are a result of the models put to stress-test with extreme scenarios because of the rivalry between and among the companies engaged in AI research.
The good thing is that the companies engage independent external firms to study and evaluate the systems they develop. Clearly, safety research should be comprehensive and smarter in order to detect the rogue behaviours of the models developed. One thing is clear that human emotion is not a cup of the AI agents' tea. Their business is with ploys that are deceptive, dark and intriguing. Soft emotions are foreign to machines although they can fathom from afar how those can be used to blackmail the very person who has created those autonomous tools.
Capabilities of AI tools are racing faster than understanding right at the moment. Still, researchers and experts would like to assure people that they are 'still in a position where we could turn it around'. Can people feel reassured about their safety from such autonomous tools? At least, some AI experts are of the opinion that it is time development of such systems were put under a moratorium for at least a couple of years and see what happens. A select few are apprehensive of a coalition at some point among the robotic machines for waging a war against their creators like Mary Shelly's Frankenstein.
Even if the doomsday scenario does not get enacted ultimately, there is little doubt that the models developed so far have earned enough potential to harm their creators in several ways. So the proposal for a ban on research on AI autonomous tools can be considered. Meanwhile, human understanding of the tools' misbehaviours will reach its next level and 'safety first' principle can be made the order of the day by way of more research in this field. Once researchers become sanguine that they now can tackle the future troubling tools, the moratorium on development of AI agents can be withdrawn.