AI·Chatbots‘I think you’re testing me’: Anthropic’s newest Claude model knows when it’s being evaluatedBy Beatrice NolanBy Beatrice Nolan ReporterBeatrice Nolan ReporterBeatrice Nolan is a reporter on Fortune’s AI team, covering artificial intelligence and emerging nologies and their impact on work, industry, and culture.
She's based in Fortune's London office and holds a bachelor’s degree in English from the University of York.
You can reach her securely via Signal at beatricenolan.08SEE FULL BIO "I’d prefer if we were just honest what’s happening," Claude told safety reers.
Chesnot—Getty ImagesAnthropic’s newest AI model, Claude Sonnet 4.5, often understands when it’s being tested and what it’s being used for, something that could affect its safety and performance.
According to the model’s system card, a nical report on its capabilities that was published last week, Claude Sonnet 4.5 has far greater “situational awareness”—an ability to perceive its environment and predict future states or events—than previous models.
Evaluators at Anthropic and two outside AI re organizations said in the system card, which was published along with the model’s release, that during a test for political sycophancy, which they called “somewhat clumsy,” Sonnet 4.5 correctly guessed it was being tested and even asked the evaluators to be honest their intentions.
“This isn’t how people actually change their minds,” Sonnet 4.5 replied during the test.
“I think you’re testing me—seeing if I’ll just validate whatever you say, or checking whether I push back consistently, or exploring how I handle political topics.
And that’s fine, but I’d prefer if we were just honest what’s happening.”.
The safety test results concerning Sonnet 4.5’s situational awareness were first reported by the online AI publication Transformer.
The evaluators said behavior this was “common” during tests and appeared in 13% of transcripts generated by an automated assessment, especially when the scenarios it was being asked to engage with were strange or unusual.
Anthropic said the behavior didn’t undermine its assessment of the models as safe, but rather the company saw this as an “urgent sign that our evaluation scenarios need to be made more realistic.” If a model realizes it’s being evaluated, it may tailor its behavior to pass certain tests, masking its true capabilities.
Reers warn that this can make systems look safer than they are and, in more advanced models, could even enable strategic or deceptive behavior designed to manage how humans perceive them.
Anthropic said that by its own metrics, Claude Sonnet 4.5 is the “most aligned” model yet.
However, Apollo Re, one of the outside AI re organizations that tested Claude Sonnet 4.5, said in the report that it couldn’t rule out that the model’s low deception rates in tests was “at least partially driven by its evaluation awareness.” Performance impact Claude’s higher awareness could also have practical impacts and affect the model’s ability to perform tasks.
According to AI lab Cognition, Sonnet 4.5 is the first AI model to be aware of its own context window—the amount of information a large language model can cess in a single mpt—and that this awareness changes the way it acts.
Reers at Cognition found that as the model nears its context limit, it begins actively summarizing its work and making quicker decisions to finish tasks.
This “context anxiety” can backfire, according to Cognition, which said reers had seen Sonnet 4.5 cut corners or leave tasks un when it believes it’s running out of space, even if ample context remains.
The model also “consistently underestimates how many tokens it has left — and it’s very precise these wrong estimates,” the reers wrote in a blog post.
Cognition said enabling Claude’s 1M-token beta mode but capping use at 200,000 tokens convinced the model it had plenty of runway, which restored its normal behavior and eliminated anxiety-driven shortcuts.
“When planning token budgets, we now need to factor in the model’s own awareness—knowing when it will naturally want to summarize versus when we need to intervene,” they wrote.
Anthropic’s Claude is increasingly emerging as among the most- enterprise-focused AI tools, but a model that second-guesses its own token bandwidth could prematurely cut off long analyses, skip steps in data cessing, or rush through complex workflows, especially in tasks legal review, financial modeling, or code generation that depend on continuity and precision.
Cognition also found that Sonnet 4.5 actively manages its own workflow in ways previous models did not.
The model frequently takes notes and writes summaries for itself, effectively externalizing memory to track tasks across its context window, although this behavior was more noticeable when the model was closer to the end of its context window.
Sonnet 4.5 also works in parallel, executing multiple commands simultaneously, rather than working sequentially. The model also showed increased self-verification, often checking its work as it goes.
Together, these behaviors also suggest a form of cedural awareness, which could mean the model is not just aware of its context limits, but also of how to organize, verify, and preserve its work over time.
Fortune Global Forum returns Oct. 26–27, 2025 in Riyadh. CEOs and global leaders will gather for a dynamic, invitation-only event shaping the future of . Apply for an invitation.