
Worried AI could teach people to build bioweapons? Don’t teach it how, say researchers
Key Takeaways
New research shows that scrubbing risky material from AI training data can build safeguards that are harder to bypass—and one author calls out tech giants for keeping such work under wraps.
Article Overview
Quick insights and key information
6 min read
Estimated completion
investment
Article classification
August 14, 2025
07:21 PM
Fortune
Original publisher
s·Eye on AIWorried AI could teach people to build bioweapons? Don’t teach it how, say reersBy Sharon GoldmanBy Sharon GoldmanAI ReporterSharon GoldmanAI ReporterSharon Goldman is an AI reporter at Fortune and co- Eye on AI, Fortune’s flagship AI
She has written digital and enterprise for over a decade.SEE FULL BIO Welcome to Eye on AI! In this edition…teaching Deep Ignorance…Cohere’s big funding and new hire…AI deskilling…Anthropic acquires Humanloop cofounders…ChatGPT market
What if stopping AI from helping someone build a biological weapon was as simple as never teaching it how? That question had long intrigued Stella Biderman, executive director of the grassroots nonfit re lab Eleuther AI
In collaboration with the British government’s AI Security Institute, and lead Kyle O’Brien and Stephen Casper, Biderman set out to find the answer — something that had never been explored in public before
In a new paper, Deep Ignorance, the reers found that ing risky information out of an AI model’s training data from the start can “bake in” safeguards that are harder to tamper with—even in open-source models that anyone can download and adapt
Crucially, these tections didn’t noticeably hurt the model’s overall performance
To test the apach, the team trained versions of an open-source AI model on datasets scrubbed of certain “xy” information—safe stand-ins for dangerous content, such as material related to bioweapons
The models trained on cleaner data were less able to duce harmful information, while performing just as well on most other tasks
In an X thread the ject, Casper said the goal was to make LLMs “not only safe off the shelf, but also resist harmful tampering.” That’s difficult because most safety efforts so far have focused on post-training tweaks—changes made after a model is built
Those fixes, such as fine-tuning a model’s responses to avoid dangerous outputs, can work in the short term but are easier to undo and can sometimes weaken the model in unint ways
Pre-training s aim to bake in safety from the start, so the model stays safe even if someone tries to tamper with it later
Biderman noted that this kind of work is rare in public re because it’s expensive and time-consuming—a barrier for most academic and nonfit groups
Private AI companies OpenAI and Anthropic have the resources, she said, but avoid revealing details of their pretraining cesses for competitive reasons and out of concern over copyright risks. “They could absolutely do this, and who knows if they do it,” she said. “They are incredibly secretive, and don’t really tell you anything.” She pointed to OpenAI’s own hints that it uses some ing in both its recently released open-weights model and in its prietary GPT-4o
In the company’s model card for the open-weights model, OpenAI writes: “To imve the safety of the model, we ed the data for harmful content in pre-training, especially around hazardous biosecurity knowledge, by reusing the CBRN pre-training s from GPT-4o.” In other words, the company applied the same screening cess used in GPT-4o to weed out potentially dangerous chemical, biological, radiological, and nu information before training
For Biderman, Deep Ignorance is meant to go beyond what companies are willing to say publicly. “Having this out in public enables more people to do better,” she said
She added that she was motivated in part by the industry’s refrain that its massive datasets can’t be documented or scrutinized. “There’s a story that OpenAI especially really s to tell how data is unfathomably large, how could we possibly know what’s in our data,” she said. “That is something that has pissed me off for a long time
I think demonstrating repeatedly that this is wrong is important.” With that, here’s the rest of the AI news
Sharon Goldmansharon.goldman@fortune.com@sharongoldmanFORTUNE ON AIGPT-5’s model router ignited a user backlash against OpenAI—but it might be the future of AI – by Sharon GoldmanAI is already creating a billionaire boom: There are now 498 AI unicorns—and they’re worth $2.7 trillion – by Julia CoacciA flood of AI deepfakes challenges the financial sector, with over 70% of new enrollments to some firms being fake – by Lionel LimAI IN THE NEWSCohere raises $500 million, hires former Meta AI leader Joelle Pineau
Cohere announced today that it has raised $500 million in an overd funding round valuing the company at $6.8 billion, led by Inovia Capital and Radical Ventures with backing from AMD Ventures, NVIDIA, PSP Investments, Salesforce Ventures, and others
Cohere also announced that it had hired former Meta AI leader Joelle Pineau as chief AI officer and Francois Chadwick as chief financial officer. "Having Joelle and Francois join at the same time as we are bringing in this new round of funding is really a game-changer," Cohere co-founder and CEO Aidan Gomez told Fortune. "The rate of growth in 2025 has been absolutely incredible, with companies realizing our security-first apach is fundamentally unique—this supercharges everything we are doing." AI quickly eroded doctors’ ability to spot cancer, study finds
According to Bloomberg, a new study in The Lancet Gastroenterology and Hepatology offers a cautionary tale AI in medicine: it can boost performance—but also cause skill erosion
Reers found that doctors using AI to spot pre-cancerous colon growths became so reliant on the tool that, when it was removed, their detection rates dropped 20% below pre-AI levels
The randomized trial, conducted at four endoscopy centers in Poland, suggests over-reliance on AI may make clinicians “less motivated, less focused, and less responsible” when working without it
The findings come as health systems — including the UK, which recently funded a major AI breast cancer trial — increasingly adopt AI to imve diagnostics
Anthropic acquires the co-founders and most of the team beyond Humanloop. crunch reported that Anthropic has acqui-hired the co-founders and most of the team behind Humanloop, a UK-based startup known for its enterprise-focused AI tooling, including mpt management, model evaluation, and observability
Around a dozen engineers and reers—including CEO Raza Habib, CTO Peter Hayes, and CPO Jordan Burgess—will join Anthropic, though the deal did not include Humanloop’s assets or IP
The hire strengthens Anthropic’s enterprise push by adding talent experienced in building the infrastructure that helps companies run safe, reliable AI at scale
Humanloop, founded in 2020, has worked with customers Duolingo, Gusto, and Vanta, and previously raised $7.91 million in seed funding from YC and Index Ventures
AI CALENDARSept. 8-10: Fortune Brainstorm , Park City, Utah
Oct. 6-10: World AI Week, AmsterdamOct. 21-22: TedAI San Francisco
Dec. 2-7: NeurIPS, San Diego Dec. 8-9: Fortune Brainstorm AI San Francisco
EYE ON AI NUMBERS78.5%That is ChatGPT's of the generative AI market today, according to data by SimilarWeb
The rest of the field trails far behind: Gemini (8.7%), DeepSeek (4.1%), Grok (2.5%), Perplexity (1.9%), Claude (1.6%), and Copilot (1.2%)
Less than three years after its debut in November 2022, ChatGPT is also the fifth most-visited website in the world—and the fastest-growing, with traffic up 134.9% year over year.This is the online version of Eye on AI, Fortune's weekly on how AI is shaping the future of . for free.
Related Articles
More insights from FinancialBooklet