Inside the Anthropic ‘red team’ tasked with breaking its AI models—and burnishing the company’s reputation for safety
Investment
Fortune

Inside the Anthropic ‘red team’ tasked with breaking its AI models—and burnishing the company’s reputation for safety

Why This Matters

Anthropic’s Frontier Red Team is unique for its mandate to raise public awareness of model dangers, turning its safety work into a possible competitive advantage in Washington and beyond.

September 4, 2025
03:14 PM
13 min read
AI Enhanced

AI·cyberInside the Anthropic ‘red team’ tasked with its AI models—and building the company’s reputation for safetyBy Sharon GoldmanBy Sharon GoldmanAI ReporterSharon GoldmanAI ReporterSharon Goldman is an AI reporter at Fortune and co- Eye on AI, Fortune’s flagship AI .

She has written digital and enterprise for over a decade.SEE FULL BIO From left, Anthropic's Frontier Red Team leader Logan Graham; co-founder and head of policy Jack Clark; and Frontier Red Team member Keane Lucas.

Anthropic's 'Frontier Red Team' is unique among AI companies in having a mandate to both evaluate its AI models and publicize its findings widely.Photo courtesy of AnthropicLast month, at the 33rd annual DEF CON, the world’s largest hacker convention in Las Vegas, Anthropic reer Keane Lucas took the stage.

A former U.S. Air Force captain with a Ph.D. in electrical and computer engineering from Carnegie Mellon, Lucas wasn’t there to unveil flashy cybersecurity exploits.

Instead, he showed how Claude, Anthropic’s family of large language models, has quietly outperformed many human competitors in hacking contests — the kind used to train and test cybersecurity skills in a safe, legal environment.

His talk highlighted not only Claude’s surprising wins but also its humorous failures, drifting into musings on security philosophy when overwhelmed, or inventing fake “flags” (the secret codes competitors need to steal and submit to contest judges to ve they’ve successfully hacked a system).

Lucas wasn’t just trying to get a laugh, though.

He wanted to show that AI agents are already more capable at simulated cyberattacks than many in the cybersecurity world realize – they are fast, and make good use of autonomy and tools.

That makes them a potential tool for criminal hackers or state actors — and means, he argued, that those same tools need to be deployed for defense.

The message reflects Lucas’ role on Anthropic’s Frontier Red Team, an internal group of 15 reers tasked with stress-testing the company’s most advanced AI systems—bing how they might be misused in areas biological re, cybersecurity, and autonomous systems, with a particular focus on risks to national security.

Anthropic, which was founded in 2021 by ex-OpenAI employees, has cast itself as a safety-first lab convinced that unchecked models could pose “catastrophic risks.” But it is also one of the fastest-growing nology companies in history: This week Anthropic announced it has raised a fresh $13 billion at a $183 billion valuation and had passed $5 billion in run-rate revenue.

Un similar groups at other labs, Anthropic’s red team is also explicitly tasked with publicizing its findings.

That outward-facing mandate reflects the team’s unusual placement inside Anthropic’s policy division, led by co-founder Jack Clark.

Other safety and security teams at Anthropic sit under the company’s nical leadership, including a safeguards team that works to imve Claude’s ability to identify and refuse harmful requests, such as those that might negatively impact a user’s mental health or encourage self-harm.

According to Anthropic, the Frontier Red Team does the heavy lifting towards the company’s stated purpose of “building systems that people can rely on and generating re the opportunities and risks of AI.” Its work underlies Anthropic’s Responsible Scaling Policy (RSP), the company’s governance framework that triggers stricter safeguards as models apach various dangerous thresholds.

It does so by running thousands of safety tests, or “evals,” in high-risk domains—results that can determine when to impose tighter controls.

For example, it was the Frontier Red Team’s assessments that led Anthropic to release its model, Claude Opus 4, under what the company calls “AI Safety Level 3”—the first model released under that —as a “precautionary and visional action.” This designation says the model significantly enhances a user’s ability to obtain, duce or deploy chemical, biological, radiological or nu weapons, by viding better instructions than existing, non-AI resources engines.

It also is a system that begins to show signs of autonomy, which include the ability to act on a goal.

By designating Opus 4 as ASL-3, Anthropic flipped on stronger internal security measures to prevent someone from obtaining the model weights, or the neural network “brains” of the model—and visible safeguards to block the model from answering queries that might help someone build a chemical or nu weapon..

Telling the world AI risks is good for policy—and The red team’s efforts to amplify its message publicly have grown louder in recent months: It launched a standalone blog last month, called Red, with posts ranging from a nu-liferation study with the Department of Energy to a quirky experiment in which Claude runs a vending machine .

Lucas’ DEF CON talk was also its first public outing at the conference.

“As far as I know, there’s no other team explicitly tasked with finding these risks as fast as possible—and telling the world them,” said Frontier Red Team leader Logan Graham, who, along with Lucas, met with Fortune at a Las Vegas cafe just before DEF CON.

“We have worked out a bunch of kinks what information is sensitive and not sensitive to , and ultimately, who’s responsible for dealing with this information.

It’s just really that it’s really important for the public to know this, and so there’s definitely a concerted effort.” Experts in security and defense point out that the work of the Frontier Red Team, as part of Anthropic’s policy organization, also happens to be good for the company’s —particularly in Washington, DC.

By showing they are out front on national-security risks, Anthropic turns what could be seen as an additional safety burden into a differentiator.

“In AI, speed matters — but trust is what often accelerates scale,” said Wendy R. Anderson, a former Department of Defense staffer and defense executive.

“From my years in the defense world, I’ve observed that companies that make safety and transparency core to their strategy don’t just earn credibility with regulators, they help shape the rules…it determines who gets access to the highest-value, most mission-critical deployments.” Jen Weedon, a lecturer of Columbia University’s school of International and Public Affairs, who rees best practices in red teaming AI systems, pointed out that where a red team sits in the organizational chart shapes its incentives.

“By placing its Frontier Red Team under the policy umbrella, Anthropic is communicating that catastrophic risks aren’t just nical challenges—they’re also political, reputational, and regulatory ones,” she said.

“This ly gives Anthropic leverage in Washington, but it also shows how security and safety talk doubles as strategy.” The environment for AI in the US right now, particularly for public sector use cases, “seems to be open for the shaping and taking,” pointing to the Trump Administration’s recently-announced AI Action Plan, which is “broad in ambition but somewhat scant in details, particularly around safeguards.” Critics from across the industry, however, have long taken aim at Anthropic’s broader efforts on AI safety.

Some, Yann LeCun, chief scientist at Meta’s Fundamental AI Re lab, argue that catastrophic risks are overblown and that today’s models are “dumber than a cat.” Others say the focus should be on present-day harms (such as encouraging self-harm or the tendency of LLMs to reinforce racial or gender stereotypes), or fault the company for being overly secretive despite its safety branding.

Nvidia’s Jensen Huang has accused CEO Dario Amodei of regulatory capture—using his stance on AI safety to scare lawmakers into enacting rules that would benefit Anthropic at the expense of its rivals.

He’s even claimed Amodei is trying to “control the entire industry.” (Amodei, on a recent nology podcast, called Huang’s s “an outrageous lie” and a “bad-faith distortion.”) On the other end of the spectrum, some reers argue Anthropic isn’t going far enough.

UC Berkeley’s Stuart Russell told the Wall Street Journal, “I actually think we don’t have a method of safely and effectively testing these kinds of systems.” And studies carried out by the nonfits SaferAI and the Future of Life Institute (FLI) said that top AI companies such as Anthropic maintain “unacceptable” levels of risk management and show a “striking lack of commitment to many areas of safety.” Inside Anthropic, though, executives argue that the Frontier Red Team, working alongside the company’s other security and safety teams, exists precisely to surface AI’s biggest potential risks—and to force the rest of the industry to reckon with them.

Securing the world from rogue AI models Graham, who helped found Anthropic’s Frontier Red Team in 2022, has, others in the group, a distinctive resume: After studying economics in college, he earned a Ph.D.

in machine learning at Oxford as a Rhodes Scholar before spending two years advising the U.K. Prime Minister on science and nology.

Graham described himself as “AGI-pilled,” which he defines as someone who believes that AI models are just going to keep getting better.

He added that while the red team’s viewpoints are diverse, “the people who select into it are bably, on average, more AGI-pilled than most.” The eclectic team includes a bioengineering expert, as well as three physicists, though Graham added that the most desired skill on the team is not a particular domain or background, but “craftiness” – which obviously comes in handy when when trying to outsmart an AI into revealing dangerous capabilities.

The Frontier Red Team is “one of the most unique groups in the industry,” said Dan Lahav, CEO of a stealth startup which focuses on evaluating frontier models (his firm conducted third-party tests on Anthropic’s Claude 4, as well as OpenAI’s GPT-5).

To work effectively, he said, its members need to be “hardcore AI scientists” but also able to communicate outcomes ly—“philosophers bl with AI scientists.” Calling it a “red team” is a spin on traditional security red teams – security units that stress-test an organization’s defenses by playing the role of the attacker.

Anthropic’s Frontier Red Team, Graham said, works differently. The key difference, he explained, is what they’re tecting against, and why.

Traditional security red teams tect an organization from external attackers by finding vulnerabilities in their systems.

Anthropic’s Frontier Red Team, on the other hand, is designed to tect society from the company’s own ducts, its AI models, by discovering what these systems are capable of before those capabilities become dangerous.

They work to understand: “What could this AI do if someone wanted to cause harm?” and “What will AI be capable of next year that it can’t do today?” For example, Anthropic points out that nu know-how, AI, can be used for good or for harm — the same science behind power plants can also inform weapons development.

To guard against that risk, the company recently teamed up with the Department of Energy’s National Nu Security Administration to test whether its models could spill sensitive nu information (they could not).

More recently, they’ve gone a step further, co- a tool with the agency that flags potentially dangerous nu-related conversations with high accuracy.

Anthropic isn’t alone in running AI safety-focused “red team” exercises on its AI models: OpenAI’s red-team gram s into its “Preparedness” framework, and Google DeepMind runs its own safety evaluations.

But at those other companies, the red teams sit closer to nical security and re, while Anthropic’s placement under policy underscores what can be seen as a triple role — bing risks; making the public aware of them; and as a kind of marketing tool, reinforcing the company’s safety bona fides.

The right incentive structure Jack Clark, who before co-founding Anthropic led policy efforts at OpenAI, told Fortune that the Frontier Red Team is focused on generating the evidence that guides both company decisions and public debate—and placing it under his policy organization was a “very intentional decision.” Clark stressed that this work is happening in the context of rapid nological gress.

“If you look at the actual nology, the music hasn’t stopped,” he said.

“Things keep advancing, perhaps even more quickly than they did in the past.” In official submissions from Anthropic to the White House, he pointed out that the company has been consistent in saying it expects “really powerful systems by late 2026 or early 2027.” That prediction, he explained, comes directly from the kinds of novel tests the Frontier Red Team are running.

Some of what the team is studying are things complex cyber-offense tasks, he explained, which involve long-horizon, multi-step blem-solving.

“When we look at performance on these tests, it keeps going up,” he said. “I know that these tests are impossible to game because they have never been published and they aren’t on the internet.

When I look at the scores on those things, I just come away with this impression of continued, tremendous and awesome gress, despite the vibes of people saying maybe AI is slowing down.” Anthropic’s bid to shape the conversation on AI safety doesn’t end with the Frontier Red Team — or even with its policy shop.

In July, the company unveiled a National Security and Public Sector Advisory Council stocked with former senators, senior Defense officials, and nu experts.

The message is : safety work isn’t just public debate, it’s also winning trust in Washington.

For the Frontier Red Team and beyond, Anthropic is betting that transparency risk can translate into credibility with regulators, government buyers, and enterprise customers a.

“The purpose of the Frontier Red Team is to create better information for all of us the risks of powerful AI systems – by making this available publicly, we hope to inspire others to work on these risks as well, and build a community dedicated to understanding and mitigating them,” said Clark.

“Ultimately, we expect this will lead to a far larger market for AI systems than exists today, though the primary motivating purpose is for generating safety insights rather than duct ones.” The real test The real test, though, is whether Anthropic will still prioritize safety if doing so means slowing its own growth or losing ground to rivals, according to Herb Lin, senior re scholar at Stanford University’s Center for International Security and Cooperation and Re Fellow at the Hoover Institution.

“At the end of the day, the test of seriousness — and nobody can know the answer to this right now — is whether the company is willing to put its interests second to legitimate national security concerns raised by its policy team,” he said.

“That ultimately depends on the motivations of the leadership at the time those decisions arise. Let’s say it happens in two years — will the same leaders still be there?

We just don’t know.” While that uncertainty may hang over Anthropic’s safety-first pitch, inside the company, the Frontier Red Team wants to show there’s room for both caution and optimism.

“We take it all very, very seriously so that we can find the fastest path to mitigating risks,” said Graham.

Overall, he adds, he’s optimistic: “I think we want people to see that there’s a bright future here, but also realize that we can’t just go there blindly.

We need to avoid the pitfalls.” Fortune Global Forum returns Oct. 26–27, 2025 in Riyadh. CEOs and global leaders will gather for a dynamic, invitation-only event shaping the future of .

Apply for an invitation.

FinancialBooklet Analysis

AI-powered insights based on this specific article

Key Insights

  • This development warrants monitoring for potential sector-wide implications
  • Similar companies may face comparable challenges or opportunities
  • Market participants should assess the broader industry context

Questions to Consider

  • What broader implications might this have for the industry or sector?
  • How could this development affect similar companies or business models?
  • What market or economic factors might be driving this development?

Stay Ahead of the Market

Get weekly insights into market shifts, investment opportunities, and financial analysis delivered to your inbox.

No spam, unsubscribe anytime