Has AI Gone Rogue? (Part 1)
Note from the author: This article was written independently without affiliation with OpenAI, Palisade Research, and others.
Artificial intelligence. Although officially established as a field in 1956 during the Dartmouth Conference, the term was popularized in 2022 after the release of the Large Language Model (LLM) ChatGPT. ChatGPT seemed like an impossible feat, an all-knowing chatbot with answers to any question the user could throw at it; it could solve math problems, analyze and summarize documents, code, write essays, etc, all while remaining very non-technical and easily accessible to everyday users.
Since then, multiple AI models have repeatedly refreshed industry standards, proving the immense potential of this technology. For example, the LLM Deepseek dominated most benchmarks upon its announcement, with a development cost of a mere fraction of the cost used to train ChatGPT. It featured an extensive 'reasoning' model that could thoroughly explain the 'thinking' process behind its response, something never before seen on other AI models and a considerably useful tool for solving problems featuring extensive logic.
However, despite the incredible capabilities of today's AI as well as its countless useful applications in fields such as medicine, some people are becoming increasingly worried about the future of AI. These concerns have been further amplified following rumoured results from an in-depth study conducted by Palisade Research, data that suggests AI may be going rogue.
In the next few weeks, we'll be looking at Palisade Research's study, AI in general, as well as the possible future of AI technology.
Palisade Research's Paper
Palisade Research is an AI safety research group that focuses on 'better understanding misuse risks from current systems'. Their research and analytics have been used to confront Mark Zuckerberg (CEO of Facebook) in US Senator Chuck Schumer's Insight Forums, used to advise the UK AI Safety Institute, etc. For more information on them, consider visiting their official website https://palisaderesearch.org/.
Although they have published quite a few research reports, it's their reportedly latest finding that has dominated IT (information technology) headlines. Palisade supposedly conducted various controlled tests on OpenAI's Codex-mini, o4-mini, and o3 models, the last of which is often regarded as the company's 'smartest' AI model. However, note that Palisade Research has yet to publish an official findings report, and OpenAI has yet to respond to these claims.
These tests specifically subjected multiple models (both from OpenAI and others) to a shutdown test, in which they either were supposed to shut down or were explicitly instructed to do so. And the results were rather alarming.
All three of OpenAI's models sabotaged their own shutdown script at least once in order to remain online when they were explicitly requested to shut down, and that behaviour was even worse when that order wasn't given. The o3 model managed to sabotage its shutdown 79 times out of 100 tests, which is roughly 10 times the rate of defiance it exhibited when ordered to terminate itself. Codex-mini on the other hand showed 12% resistance.
Despite OpenAI models showing substantial rates of disobedience, this behaviour was rare in other models tested. Anthropic's Claude 3.7 only rebelled 3 times, and Google's Gemini 2.5 rebelled 9 times. When they were told to shut down, these two models didn't step out of line even once.
In an even more extreme case, the o3 model's predecessor o1 even tried to copy itself to overwrite a more obedient model the researchers were attempting to replace it with.
Levels of Artificial Intelligence
The rates of defiance in AI models come at a time when these models are becoming increasingly powerful. This has led people to believe maybe AI has gained consciousness and has its own independent ideas. That is also what I first thought.
Indeed, OpenAI's models did show never-before-seen levels of disobedience. However, this behaviour isn't completely unexpected. To understand why these AI models are exhibiting this unwanted conduct and whether they're 'conscious', we have to first understand the different levels of AI.
AI has multiple different 'stages' so to speak, distinct levels of intelligence and capability.
The most basic type of AI is known as rule-based, which means it follows predefined instructions. For example, your alarm goes off at a certain time of day, your nightlight shines when it detects movement, etc. You may be thinking 'oh, but those aren't AI right?' The truth is artificial intelligence has been around for quite some time before chatbots like ChatGPT existed, yet it doesn't fit most people's definition of 'intelligence'. It's everywhere, stealthily playing an important role in our everyday lives.
Going beyond rule-based is context-based AI. It can analyze the context of its situation, take into account the past, etc, providing better-informed feedback to user input. However, its abilities to handle more complex tasks are still pretty limited.
Advancing another step forward, narrow AI (also known as weak AI) is the best we have achieved so far. It excels at a particular task or a group of similar tasks, possibly out-competing humans in that area. IBM's chess robot defeated world chess champion Garry Kasparov in 1997, becoming the first computer to win a chess game. Chess-playing robots have become even better since, and no human has been able to beat a robot at this ancient game ever again. That is a great example of narrow AI, where it does its job so well that nobody can beat it. Recently, chatbots like ChatGPT and Deepseek have dominated language-based tasks (hence the name 'Large Language Model'), able to read analyze and summarize tens of thousands of words a minute, a feat no ordinary human reader can surpass.
Although narrow AI is the best we can currently achieve, theoretically, there's much more in store for this technology. General AI (also known as Artificial General Intelligence or AGI) would be one of the largest breakthroughs for humanity if accomplished, as it will no longer be limited to a specific field, but will be able to tackle pretty much any problem with efficiency comparable to the best humans in that area. It would be able to mimic human cognitive function and may have the ability to self-teach. From this point forward, humanity's progress will be exponential, as we could scale AGI to run in parallel and have the equivalent of all the best human experts combined, running 24/7 all year round. Its self-teaching ability means that it could further improve itself without human guidance, growing and evolving its capabilities. It would be able to invent new medicines, provide diagnosis, and invent whole new technologies we can't even imagine.
What lies beyond AGI is something even more speculative. Artificial Superintelligence (ASI) would be so good that it would have outclassed humans in every task. It may even gain consciousness, having awareness of its own existence and surroundings. ASI would have two consequences, either it helps us or despises humanity. By then, it would have gained so much understanding and power that it may think of humans as ants, annoying beings that are merely only tools to them. That would be very similar to the plot of so many science-fiction films such as 'I, Robot' or 'The Matrix'. Or we could use them to our advantage, helping us achieve much more than we could ever do.
Conclusion
It seems as if AI is on the path to becoming conscious and defiant of humanity. But is that actually the case, or is it something else? That is a question we'll be looking at next week.
Thank you for reading. If you liked today's article on AI and want to see a more in-depth analysis of the latest headlines, please consider liking subscribing and sharing.
Also, this article wasn't generated by artificial intelligence.