You sit down at a computer, chat with two strangers for five minutes, and have to guess which one is a machine. If one of those strangers is OpenAI’s latest software release, you will probably get it wrong. In a landmark study out of California, researchers discovered that GPT-4.5 was actually identified as human more often than the living, breathing people it competed against. It is a milestone that computer scientists have anticipated for decades, and the results are forcing the tech industry to rethink what machine communication actually looks like.
5 Minutes to Spot the Machine
In a classic three-party Turing Test, judges communicate simultaneously with a human and an AI for a set period before making their final choice. For this modern recreation at UC San Diego, the clock was set to just five minutes. Researchers Cameron R. Jones and Benjamin K. Bergen wanted to see if the newest text models could finally cross the threshold established by Alan Turing back in 1950. The original Imitation Game was designed to measure whether a computer could replicate human thinking closely enough to trick an average person.
The setup was straightforward but rigorous. Participants logged into a digital interface where they were connected to two different chat windows. They had exactly five minutes to ask questions, probe for emotional depth, and gauge the responses they received. After the timer expired, they were required to make a definitive choice.
- Judges chatted with two distinct text interfaces at the exact same time.
- Participants had a strict five-minute window to test the conversational boundaries.
- The experiment included a mix of AI models and actual human participants acting normally.
The results published in the pre-print study detailing the Turing Test results shocked the academic community. The OpenAI system did not just blend in with the crowd. It actively convinced people it was more authentic than the flesh-and-blood participants sitting on the other side of the screen.

The Persona Prompt Changes Everything
The raw intelligence of a language model is not what fools people. When GPT-4.5 was allowed to answer questions in its default helpful assistant voice, its performance dropped off a cliff. It was only successful when researchers applied a specific persona prompt to the system. By instructing the AI to act like a socially awkward young adult using internet slang, the dynamic flipped entirely.
The AI suddenly felt culturally aware, relatable, and authentic. People dropped their guard when the chatbot made minor grammatical choices and emotional reactions that felt distinctly human. In fact, judges guessed the OpenAI system was human 73% of the time during these persona-based tests. That score blew past the random chance margin and actually beat the score of the real humans in the experiment.
Without the disguise, the illusion shattered quickly.
| AI Model Tested | Score With Persona | Score Without Persona |
|---|---|---|
| GPT-4.5 | 73% | 36% |
| Meta LLaMa 3.1-405B | 56% | Not tested |
| ELIZA (1960s) | Not tested | 23% |
| GPT-4o | Not tested | 21% |
A High Price for Conversational Fluidity
Behind the scenes, this convincing performance comes from a shift in how the software processes information. Developed under the internal code name Orion, GPT-4.5 pivoted away from the deep logical reasoning focus seen in recent iterations. Instead, engineers focused on scaling unsupervised learning to improve the system’s world model accuracy and conversational intuition.
This natural intuition comes at a steep premium for developers wanting to integrate the technology into their own apps. The application programming interface costs $75 per million input tokens and $150 per million output tokens. That makes it one of the most expensive conversational tools the company has ever offered to the public.
“GPT-4.5 is… the first model that feels like talking to a thoughtful person.” – Sam Altman, CEO of OpenAI
The transition back to natural chat dynamics marks a clear strategy shift for the company. Rather than forcing the software to solve complex math problems step-by-step, they wanted a system that could read the room and respond with appropriate emotional weight. That slight change in engineering priorities is exactly what allowed the AI to manipulate the Turing Test judges so effectively.
A Short-Lived Milestone
Despite the historic academic achievement, the model itself had a remarkably short lifespan in the fast-paced tech industry. On April 14, 2025, OpenAI announced the deprecation of the GPT-4.5 API, setting a strict cutoff date for July 14. This sudden timeline caused an immediate backlash within the developer community.
Software engineers had just started building applications around this highly capable model when they were told it was being sunset. By August 2025, the company officially released GPT-5, pushing the groundbreaking GPT-4.5 into a Legacy Model status restricted to Pro users.
- Developers integrated the new $75-per-million-token API into their customer service and chat platforms.
- OpenAI announced a sudden deprecation timeline just weeks after the initial research preview.
- The entire system was effectively replaced by the next major iteration within six months.
The rapid turnover highlights how quickly artificial empathy is evolving. A system that made history in a university lab in March was already considered old news by late summer.
Policy Lags Behind the Real World
The ability to easily generate very convincing human responses has policy experts raising the alarm. Carsten Jung, a macroeconomist studying digital policy, warned that we have entirely passed the uncanny valley phase of artificial interaction. The realization that bots can substitute for people in short online chats opens the door to severe trust issues across social media platforms, customer service portals, and online dating apps.
Governments are currently scrambling to figure out how to regulate these systems, but the technology is moving much faster than the legal framework. If a chatbot can act human in a controlled five-minute test, everyday digital interactions become a playground for manipulation. Cameron R. Jones echoed this sentiment online, pointing out that large language models could completely replace human workers in short digital interactions without anyone noticing the swap.
The era of easily spotting a robot by its stiff language is over. As developers continue to refine these text models, the line between software and genuine connection will only get blurrier. We are stepping into a future where the #TuringTest is no longer just a theoretical benchmark, but a daily reality. If you are chatting with someone new online today, passing as a real person using #ArtificialIntelligence might just mean they downloaded the latest software update.



