
Seven Co-Founders Panel: How Anthropic Was Born
TechFlow Selected TechFlow Selected

Seven Co-Founders Panel: How Anthropic Was Born
“No one wants to start a business, but everyone feels they have to.”
Compiled & Translated by TechFlow

Guests: Chris Olah, Jack Clark, Daniela Amodei, Sam McCandlish, Tom Brown, Dario Amodei, Jared Kaplan — Co-founders of Anthropic
Podcast Source: Anthropic
Original Title: Building Anthropic | A Conversation with Our Co-founders
Air Date: December 20, 2024
Key Takeaways
Last week, Anthropic experienced two consecutive incidents:
First, nearly 3,000 internal documents were accidentally exposed to public access due to a misconfiguration in its content management system (CMS). Then, Claude Code v2.1.88 was published on npm with a 59.8 MB source map containing 510,000 lines of source code—fully exposed and unprotected.
A company that has “safety” encoded into its DNA repeatedly stumbled on its own infrastructure operations—an irony impossible to ignore.
Yet before rushing to mock, consider listening to an internal conversation among Anthropic’s seven co-founders from over a year ago. Recorded in December 2024, this podcast features the founders reflecting on how the company was founded, how the Responsible Scaling Policy (RSP) was painstakingly developed, why the word “safety” shouldn’t be used casually, and CEO Dario Amodei’s oft-quoted line:
“If a building triggers its fire alarm every week, it is, in fact, an extremely unsafe building.”
Hearing that line now carries a very different resonance.
Meet the Seven Co-founders
Dario Amodei|CEO; former VP of Research at OpenAI; trained in neuroscience; final decision-maker on Anthropic’s strategy and safety roadmap. He speaks most frequently in this discussion.
Daniela Amodei|President; Dario’s sister. Previously spent five and a half years at Stripe, leading its Trust & Safety team, and earlier worked in nonprofits and international development. She leads Anthropic’s organizational design and external communications.
Jared Kaplan|Former physics professor turned AI researcher; one of the core authors of scaling laws. Often offers outsider perspectives—and jokes he entered AI because “physics got boring.”
Chris Olah|A leading figure in interpretability research; joined the Bay Area AI community at age 19; previously worked at Google Brain and OpenAI. The most technically idealistic member of Anthropic’s founding team.
Tom Brown|Lead author of the GPT-3 paper; now manages Anthropic’s compute resources. His perspective leans toward engineering and infrastructure; he recounts his shift from skepticism about AI’s rapid advancement to full conviction.
Jack Clark|Former Bloomberg tech journalist; Head of Policy & Public Affairs at Anthropic. Serves as moderator in this conversation, guiding transitions and pressing follow-up questions.
Sam McCandlish|Research co-founder; speaks least but consistently delivers incisive, decisive insights—the “finishing move” role.
Highlights of Key Insights

Why Build AI: From Physics Fatigue to “Seeing Enough to Believe”
Jared Kaplan: “I’d been doing physics for a long time—and it got a bit dull. I also wanted to work more closely with friends—so I switched to AI.”
Dario Amodei: “I don’t think I ever explicitly ‘convinced’ you. I just kept showing you AI model outputs—demonstrating their generality, not just narrow applicability. At some point, after enough exposure, you said, ‘Hmm, this looks right.’”
Betting Against Consensus: Much of What Passes for Consensus Is Just Herd Behavior Disguised as Maturity
Jared Kaplan: “Many AI researchers were psychologically scarred by the AI winter—ambition felt forbidden.”
Dario Amodei: “My deepest lesson over the past decade? Much of what people call ‘common knowledge’ is actually herd behavior masquerading as maturity. After seeing consensus flip overnight a few times, you say: No—we’re betting on this. Even if you’re only 50% right, you’ll contribute something no one else does.”
Safety and Scaling Are Intertwined
Dario Amodei: “One reason we scaled our models early was that they needed to be sufficiently intelligent for RLHF to work at all. That remains our core belief today: safety and scaling are deeply entangled.”
RSP—the Responsible Scaling Policy—is Anthropic’s “Constitution”
Tom Brown: “For Anthropic, RSP functions like our constitution—a foundational, guiding document demanding substantial time and effort to refine iteratively.”
Dario Amodei: “RSP halts any initiative failing to meet safety standards. We’re not mouthing slogans—we embed safety concretely into every stage.”
When Fire Alarms Ring Too Often, No One Runs When the Building Actually Catches Fire
Daniela Amodei: “We can’t casually deploy the word ‘safety’ to steer project timelines. Our real goal is clarity: ensuring everyone knows exactly what we mean by ‘safety.’”
Dario Amodei: “What truly undermines safety is frequent ‘safety drills.’ If a building triggers its fire alarm weekly, it is, in fact, extremely unsafe.”
‘Noble Failure’ Is a Trap
Chris Olah: “There’s a notion that the most moral act is sacrificing other goals for safety—to demonstrate purity of intent. But this approach is self-defeating: it surrenders decision-making authority to those who don’t prioritize safety.”
The Co-founders Committed to Donate 80% of Their Income
Tom Brown: “We collectively pledged to donate 80% of our income to initiatives advancing societal progress—a commitment everyone supported without hesitation.”
No One Wanted to Start a Company—But All Felt It Was Necessary
Sam McCandlish: “None of us initially set out to found a company. We simply felt it was our responsibility—the only viable path to ensure AI develops in the right direction.”
Daniela Amodei: “Our mission is both crystal clear and uncompromising—a rarity in the tech industry.”
Interpretability: Neural Networks Harbor an Entire “Artificial Biology”
Chris Olah: “Neural networks are profoundly beautiful—and contain far more beauty than we’ve yet glimpsed. I sometimes imagine walking into a bookstore ten years from now and buying a textbook titled *The Biology of Neural Networks*, filled with astonishing revelations.”
AI for Strengthening Democracy—Not Enabling Authoritarianism
Dario Amodei: “We fear AI, if misdeveloped, could become a tool of authoritarianism. How do we instead make AI a catalyst for freedom and self-determination? This domain is just as critical as biology or interpretability.”
From White House Meetings to Nobel Prizes: AI’s Impact Has Long Surpassed the Tech Bubble
Jared Kaplan: “In 2018, you wouldn’t have imagined the President summoning you to the White House to discuss language models.”
Dario Amodei: “We’ve already seen the Nobel Prize in Chemistry awarded to AlphaFold. We must build tools capable of enabling hundreds more AlphaFolds.”
Why Study AI?
Jack Clark: Why did we start building AI in the first place? Jared—why did you get into AI?
Jared Kaplan:
I’d been doing physics for a long time—and it got a bit dull. I also wanted to work more closely with friends—so I switched to AI.
Tom Brown:
I assumed Dario convinced you.
Dario Amodei:
I don’t think I ever explicitly “convinced” you. I just kept showing you AI model outputs—demonstrating their generality, not just narrow applicability. At some point, after enough exposure, you said, “Hmm, this looks right.”
Jack Clark: Chris—when you were doing interpretability research, did you meet everyone at Google?
Chris Olah:
No. Actually, I met several of you when I first came to the Bay Area at age 19. I remember meeting Dario and Jared then—they were postdocs, and I thought they were incredibly cool. Later, at Google Brain, Dario joined and we sat side-by-side for a while. I also worked with Tom there, and later at OpenAI, collaborated with all of you.
Jack Clark:
I recall interviewing Dario at a conference in 2015—and Google PR told me I had to read all your papers first.
Dario Amodei:
I was writing *Concrete Problems in AI Safety* at Google then.
Sam McCandlish:
Before I started working with you, you invited me to your office to walk me through AI broadly. I remember thinking afterward, “This is far more serious than I realized.” You spoke about “the big blob of compute,” parameter counts, human neuron scale—you laid it all out.
Breakthrough Scaling
Jack Clark: I remember scaling laws at OpenAI—how making models bigger began working surprisingly well across many projects: from GPT-2 to scaling laws to GPT-3. We just kept getting closer.
Dario Amodei: We were that group—the ones who “got things done.”
Jared Kaplan: We were also excited about safety. There was this idea back then: AI would be powerful, but perhaps wouldn’t understand human values—or even communicate with us. Language models, however, implicitly require understanding vast amounts of tacit knowledge.
Dario Amodei:
And beyond language models, RLHF—scaling up models was partly motivated by needing them smart enough for RLHF to work at all. That remains our core belief: safety and scaling are deeply entangled.
Chris Olah:
Yes—scaling work was effectively part of the safety team then. Because to take safety seriously, you first need to forecast AI trends.
Jack Clark: I remember being in a UK airport, sampling fake news from GPT-2 and sending it via Slack to Dario with: “This actually works—it could have massive policy implications.” I recall Dario replying, “Yes.”
We did a lot of release-related work after that—it was wild.
Daniela Amodei:
I remember that release period—it was our first real collaboration, around GPT-2.
Jack Clark:
I think that helped us immensely. First, we did something “a bit odd but safety-oriented”—then later, Anthropic: a larger-scale, equally odd—but safety-oriented—endeavor.
The Early Days of AI
Tom Brown: Let’s return to *Concrete Problems*. I joined OpenAI in 2016—you and I were among the earliest hires. At the time, that paper felt like the first mainstream AI safety paper. How did it come about?
Dario Amodei:
Chris knows—he was involved. We were at Google then—I forget what my main project was—but this paper was essentially procrastination.
We wanted to list open problems in AI safety. Back then, AI safety was always discussed abstractly—we aimed to ground it in real ML practice. This line of work has now spanned six or seven years—but at the time, it seemed like a weird idea.
Chris Olah:
In some sense, it was almost a political project. Many dismissed safety as unimportant. We aimed to compile a list of problems widely accepted as reasonable—many drawn from existing literature—and secure cross-institutional, credible signatories.
I spent weeks consulting over twenty researchers in Brain to build support for publication. Judged purely on problem selection, some may not hold up today—but viewed as consensus-building—proving “these are real issues worth taking seriously”—it was a pivotal moment.
Jack Clark:
You eventually enter a very strange sci-fi world. I recall Anthropic’s early talks about Constitutional AI—Jared saying, “We write a constitution for language models, and their behavior changes.” It sounded insane then. Why did you think it was feasible?
Jared Kaplan:
Dario and I debated this extensively. I believe simple methods in AI often work remarkably well. The earliest version was complex—but we kept simplifying until it became: leverage the model’s strength in multiple-choice tasks, give it clear prompts about what to look for—and that’s sufficient. Then we can write principles down directly.
Dario Amodei:
This circles back to “the big blob of compute,” the “bitter lesson,” and the “scaling hypothesis”: if you give AI a clear objective and data, it learns. A set of instructions, a set of principles—language models can read them, compare them against their own behavior, and align training objectives accordingly. So Jared and I concluded: yes, it’s feasible—with careful iterative tuning.
Jared Kaplan:
It felt strange early on. Coming from physics, where everyone’s now excited about AI, it’s easy to forget the atmosphere then. When Dario and I discussed these ideas, many AI researchers were still psychologically scarred by the AI winter—ambition felt forbidden. To discuss safety, you first had to believe AI could become extremely powerful and useful—but there was an anti-ambition taboo. Physicists have an advantage: “arrogance”—they routinely pursue ambitious goals and speak confidently about grand visions.
Dario Amodei:
I think that’s true. In 2014, many things simply couldn’t be said aloud. It reflects academia’s broader risk aversion—industrial AI inherited this mindset—and I think it wasn’t until around 2022 that things shifted.
Chris Olah:
There are two kinds of “conservatism”: one takes risks seriously; the other treats believing in success as arrogance. We lived under the latter. History repeats: Fermi resisted nuclear physics in 1939, while Szilard and Teller took risks seriously.
Dario Amodei:
My deepest lesson over the past decade? So much of what passes for “common knowledge” is herd behavior disguised as maturity. After seeing consensus flip overnight a few times, you say: No—we’re betting on this. Maybe you’re wrong—but filtering out noise lets you bet decisively. Even if you’re only 50% right, you’ll contribute something no one else does.
Shifting Public Attitudes Toward AI
Jared Kaplan: Today, similar dynamics play out on safety issues: outside consensus assumes many safety problems won’t naturally emerge from technology—but our research at Anthropic shows they absolutely do.
Daniela Amodei:
Yet over the past 18 months, this is changing—and global sentiment toward AI is shifting visibly. In user research, ordinary users increasingly express concern about AI’s broad impact on society.
Sometimes it’s jobs, bias, toxicity; sometimes it’s “Will it destabilize the world? Change how humans collaborate?”—that last one surprised even me.
Sam McCandlish:
Oddly, the ML research community is often more pessimistic about AI becoming “very strong” than the general public.
Jared Kaplan:
In 2023, Dario and I went to the White House. Harris and Raimondo basically said: “We’re watching you closely—AI is huge, and we’re taking it seriously.” But in 2018, you wouldn’t have imagined “the President calling you to the White House to discuss language models.”
Tom Brown:
Interestingly, many of us entered this field when it still looked uncertain—like Fermi doubting the atomic bomb. Some evidence suggested it might be built—but much also suggested it wouldn’t. Yet he chose to try, because if true, the impact would be enormous.
Between 2015–2017, evidence mounted that AI could be transformative. In 2016, I spoke with my advisor: “I’ve done startups—I want to work on AI safety—but my math isn’t strong enough. What should I do?” Some said you needed mastery of decision theory; others insisted “crazy AI events won’t happen”—few supported it.
Jack Clark:
In 2014, I was labeled crazy for reporting on ImageNet trends. In 2015, I tried writing about NVIDIA’s GPU focus in papers—and got called crazy again. In 2016, leaving journalism for AI, I received emails saying, “You’ve made the biggest mistake of your life.” From many angles, betting hard on “scaling will succeed” truly felt insane.
Jared Kaplan: How did you decide? Were you conflicted?
Jack Clark:
I placed a reverse bet: demanded a full-time AI journalist role with doubled salary—I knew they’d refuse. Woke up the next day and resigned. Every day, I read archival documents—and I sensed something wildly significant unfolding. At some point, you go all-in with high conviction.
Tom Brown:
I wasn’t that decisive—I wavered for six months.
Daniela Amodei:
And back then, the idea that “engineers could meaningfully advance AI” wasn’t mainstream. It was “only researchers do AI”—so your hesitation was understandable.
Tom Brown:
Later, OpenAI said, “You can help with AI safety through engineering”—and that’s what brought me in. Daniela—you managed me at OpenAI. Why did you join?
Daniela Amodei:
I’d spent five and a half years at Stripe—Greg was my manager there. I introduced Greg to Dario. When he was founding OpenAI, I told him: “The smartest person I know is Dario. If you get him on board, you’ll be incredibly lucky.” And Dario joined OpenAI.
Like you, I was also contemplating what to do after Stripe. I’d joined Stripe because, after working in nonprofits and international development, I felt I needed more skills—I even thought I’d return to that space.
Before joining Stripe, I felt I lacked the capacity to help people less privileged than myself. So I explored tech companies, seeking new ways to drive broader impact—and OpenAI stood out. It was a nonprofit pursuing an important, visionary goal.
I’ve always believed in AI’s potential—partly because of my familiarity with Dario—and they clearly needed help managing operations. So the role aligned perfectly with my background. I thought: “This is a nonprofit, filled with brilliant, well-intentioned people—but their operations feel chaotic.” That challenge excited me—I wanted to jump in.
At the time, I felt like a Swiss Army knife—managing team members, leading technical teams, overseeing org expansion, working on language teams, handling policy matters, collaborating with Chris. The caliber of talent here made me eager to join and help build efficiency and order.
Jack Clark: I recall you saying, after GPT-3, “Have you heard of trust and safety?”
Daniela Amodei:
I led the trust and safety team at Stripe. For technologies like this, you’ll need to consider trust and safety—it’s the bridge between AI safety research and practical daily work: how to make models genuinely safe.
Declaring “this technology will have major future impact” is vital. Simultaneously, we need pragmatic daily work laying foundations for higher-risk scenarios ahead.
Responsible Scaling Policy: Ensuring Safe AI Development
Jack Clark: Let’s discuss how the Responsible Scaling Policy (RSP) emerged—why we conceived it, and how we apply it today—especially given our current work on model trust and safety. Who first proposed RSP?
Dario Amodei:
Paul Christiano and I originated the idea around late 2022. Initially, we asked: Should we pause scaling before reaching certain sizes until solving specific safety problems?
But we soon realized pausing at a threshold then lifting restrictions felt odd. Instead, we defined a series of thresholds—each requiring rigorous testing to assess whether the model possesses corresponding safety capabilities.
Each threshold demands stricter safety and assurance measures. Yet early on, we agreed: this should ideally be implemented by third parties. If one company alone enforces it, others may reject adoption. Paul personally designed the framework. Over time, many details evolved—and our team continuously refined implementation.
Once Paul formalized the concept, he announced it—and within one or two months, we published our own version. Many team members contributed deeply. I wrote at least one early draft—but the document underwent numerous revisions.
Tom Brown:
RSP functions like our “constitution”—a foundational, guiding document demanding substantial time and effort to refine iteratively for accuracy and completeness.
Daniela Amodei:
I find RSP’s evolution at Anthropic fascinating. It progressed through stages requiring diverse skills. Grand principles—led by Dario, Paul, Sam, and Jared—asked: “What are our core principles? What message do we convey? How do we validate our direction?”
But equally vital is hands-on execution: iterating, evaluating, adjusting details. For instance, we projected certain outcomes at a given safety level—if unmet, we re-evaluate and ensure accountability for results.
Organizational redesign also mattered—e.g., restructuring RSP governance for clearer accountability. I love the constitutional analogy. Like the U.S., which built courts, Supreme Court, presidency, Congress—all partly to uphold the Constitution—Anthropic’s RSP follows a similar path.
Sam McCandlish:
This reflects our core view on safety: safety problems are solvable. It’s complex and demanding—but feasible with sustained effort.
Like automotive safety, institutions took decades to mature. But our question is: Do we have enough time? So we must rapidly identify essential AI safety institutions, pioneer them here—and ensure they’re replicable elsewhere.
Dario Amodei:
This also unifies internal collaboration: if any unit violates our safety values, RSP surfaces the issue—halting unsafe initiatives. Thus, it constantly reminds everyone that safety is non-negotiable in product development and planning. We’re not chanting slogans—we embed safety concretely in every step. If someone joins and can’t embrace these principles, they’ll struggle to fit in. Either adapt—or find it unsustainable.
Jack Clark:
Over time, RSP grew increasingly vital. We’ve invested thousands of hours—and when explaining RSP to senators, I say: “We’ve built safeguards ensuring our tech resists misuse and remains safe.” Their reaction? “That sounds normal. Doesn’t every company do this?” I chuckle—it’s not universal.
Daniela Amodei:
Beyond aligning team values, RSP boosts transparency. It clearly states our goals—understandable internally and externally. Though imperfect, we continually optimize it.
I emphasize: explicitly naming “our core concerns” matters. We can’t casually wield “safety” to steer work—e.g., “We can’t do X because of safety” or “We must do Y because of safety.” Our real goal is shared clarity on what “safety” means.
Dario Amodei:
Long-term, frequent “safety drills” undermine safety itself. As I’ve said: “If a building triggers its fire alarm weekly, it’s extremely unsafe”—because during real fires, no one reacts. We must calibrate alerts precisely.
Chris Olah:
Viewed differently, RSP creates healthy incentives across layers. Internally, it aligns each team’s incentives with safety goals—if progress stalls, related work pauses.
Externally, RSP fosters healthier incentives than alternatives. Suppose we must declare: “Our model has advanced—but we can’t guarantee safety yet.” RSP provides a clear framework and evidence base for such decisions—established in advance and easily understood. Early RSP drafts didn’t reveal its full potential—but now I see it as more effective than any alternative I can imagine.
Jared Kaplan:
I agree—but this may underestimate challenges in crafting correct policies, evaluation criteria, and boundary definitions. We’ve iterated heavily—and continue refining. A tough problem: for emerging tech, distinguishing danger from safety is often unclear. We face vast gray zones. These challenges thrilled me during RSP’s inception—and still do. Yet implementing it effectively is more complex and demanding than I first imagined.
Sam McCandlish:
Gray zones are inherently unpredictable—they’re everywhere. Only implementation reveals them. So our goal is to implement everything early—surfacing issues fast.
Dario Amodei:
You need three or four iterations to get it right. Iteration is powerful—you almost never nail it on the first try. With rising risk, iterate early—not wait until the end.
Jack Clark:
Simultaneously, you must build internal institutions and processes. Details evolve—but cultivating team execution capability is paramount.
Tom Brown:
I manage Anthropic’s compute resources. Communicating externally requires navigating divergent views on tech speed. I initially doubted rapid progress—but changed my mind, so I empathize deeply. RSP helps especially with those expecting slow development: “We don’t need extreme safety measures until tech becomes urgent.” If they say, “Nothing urgent will happen for years,” I reply: “Fine—we’ll hold off on extremes.” This smooths external dialogue.
Jack Clark:
Where Else Does RSP Impact the Team?
Sam McCandlish:
Everything centers on evaluation—every team conducts evaluations. Your training team constantly evaluates: has this model grown strong enough to pose danger?
Daniela Amodei:
This means measuring model performance against RSP benchmarks—including checking for red flags.
Sam McCandlish:
Evaluating minimum capability is relatively easy—but assessing maximum capability is extremely hard. So we invest heavily in research: “Can this model execute dangerous tasks? Might unconsidered methods—mind maps, best-event search, tool use—enable highly dangerous behavior?”
Jack Clark:
These evaluation tools aid policymaking. “Safety” is abstract—but saying, “We have an evaluation tool determining model deployment eligibility,” lets us partner with policymakers, national security experts, and CBRN specialists to co-develop precise standards. Without concrete tools, collaboration collapses—but clear standards invite participation and validation. Here, RSP’s impact is profound.
Daniela Amodei:
RSP deeply affects my work—and my approach is unique: focusing on its “tone”—its voice. Recently, we overhauled RSP’s tone—previously too technical, even adversarial. I spent much time designing a system inviting participation.
If RSP were easily readable by everyone—like OKRs—great! What’s its main goal? How do we know if achieved? What’s our current AI Safety Level (ASL)—ASL-2 or ASL-3? If everyone grasps key priorities, spotting issues becomes easier. If overly technical—only a few understand—utility plummets.
It’s exciting to see RSP evolving toward accessibility. Now, most—even all—employees, regardless of role, can read it and think: “This makes sense. I want AI developed under these principles—and I understand why these issues matter. If I hit problems, I know roughly what to watch.” We aim for simplicity—like factory workers instantly spotting: “The seatbelt should connect here—but it doesn’t.”
Crucially, establish healthy feedback loops—between leadership, board, departments, and R&D teams. Most problems arise from communication breakdowns or information distortion. If issues stem solely from that—it’s tragic, right? Ultimately, we must translate ideals into tangible, clear, universally understandable practice.
The Anthropic Origin Story
Sam McCandlish: None of us initially wanted to start a company. We felt it was our responsibility—we had to act. It was the only viable path to ensure AI develops correctly—and that’s why we made that commitment.
Dario Amodei:
My initial idea was simple: invent and explore beneficially. That led me to AI—requiring massive engineering and funding.
Yet without explicit goals and structures to govern a company, many things get done—but repeat tech-industry errors that alienated me: same people, attitudes, mindsets. So I realized we needed a radically new approach—it was inevitable.
Jared Kaplan:
Remember grad school—you had a full plan to advance public good through science. Very similar to our current thinking. I recall your “Project Vannevar” aiming for this. As a physics professor, I observed AI’s explosive growth—but academic research couldn’t fund it. I wanted to build an institution with trusted people to steer AI rightly. Honestly, I’d never advise anyone to start a company—nor did I wish to. It’s merely a means to an end. Usually, success hinges on caring deeply about a meaningful world goal—and finding the best path to achieve it.
Building a Culture of Trust
Daniela Amodei: I often reflect on our team’s strategic advantages—one surprising yet vital factor is high mutual trust. Aligning large groups behind a shared mission is hard—but at Anthropic, we successfully scale that mission. Leadership and members unite around a clear, pure mission—rare in tech.
I believe our goal carries profound meaning—we didn’t start companies for startup’s sake. We felt compelled. We couldn’t advance our work elsewhere—we had to do it ourselves.
Jack Clark:
With GPT-3’s emergence—and projects we’d all touched, like scaling laws—we saw AI’s trajectory clearly by 2020. We realized delay risked irreversible tipping points. Action was essential to shape this environment.
Tom Brown:
Extending Daniela’s point—I truly believe in our team’s high trust. Everyone joined to contribute meaningfully. We jointly pledged to donate 80% of income to socially impactful causes—a unanimous, unquestioned “Yes, absolutely.” Such trust is rare and precious.
Daniela Amodei:
I see Anthropic as politically neutral. Of course, our worldview differs from average—and I remind myself constantly. Our hiring process and team traits make our culture naturally resistant to “office politics.”
Dario Amodei:
Team cohesion is vital. Product, research, trust & safety, marketing, policy—all pursue one goal. When departments chase divergent goals, chaos ensues. Worse—when teams blame others for undermining their work.
I believe one of our greatest achievements is maintaining organizational coherence. Mechanisms like RSP help—ensuring no department creates problems while others fix them, but all fulfill roles under one unified theory of change.
Chris Olah:
I joined OpenAI as a nonprofit to focus on AI safety. But over time, I realized this model didn’t fully suit me—forcing tough decisions. I deeply trusted Dario and Daniela’s judgment—but didn’t want to leave. More AI labs aren’t necessarily better for the world—making departure agonizing.
When we finally left, I remained skeptical about starting a company. I advocated for a nonprofit focused on safety research. But pragmatism and honesty about constraints revealed Anthropic as the optimal path.
Dario Amodei:
An early lesson: promise less, deliver more. Stay grounded, confront trade-offs head-on—trust and credibility outweigh any specific policy.
Daniela Amodei:
Anthropic’s uniqueness lies in high trust and unity. Seeing Mike Krieger withhold product launches for safety reasons—and Vinay balancing business needs to ship projects—feels special. Engineers from safety and reasoning teams debate how to make products both safe and usable. This unified purpose and pragmatism defines Anthropic’s most attractive work environment.
Dario Amodei:
A healthy organizational culture means everyone understands and accepts shared trade-offs. Our world isn’t perfect—every decision balances competing interests, rarely satisfying all. But if the whole team unites behind one goal—facing trade-offs together and contributing from their roles—that’s a healthy ecosystem.
Sam McCandlish:
In a sense, it’s an “upward race.” Yes—an upward race. Not risk-free—things can go wrong—but we all agree: “This is our choice.”
Competing at AI’s Summit
Jack Clark: Markets are pragmatic—so the more successful Anthropic becomes, the more others emulate our practices. And when success ties directly to safety work, it creates industry “gravity”—pulling others into competition. Like seatbelts—others copy—healthy ecology.
Dario Amodei:
But saying, “We won’t build this tech—and neither can you do it better,” fails—it doesn’t prove feasibility from status quo to future. The world needs industry—or one company—to chart a path from “tech doesn’t exist” to “tech exists powerfully and is socially governed.” I believe the only way is confronting trade-offs head-on—at company level, ultimately industry-wide.
You must find ways to stay competitive—even lead in areas—while ensuring safety. Achieve that, and industry attraction grows massively—from regulation to top talent recruitment to customer perception—all pushing industry in the same direction. Prove safety and competitiveness coexist—find win-win solutions—and others will emulate.
Jared Kaplan:
That’s why mechanisms like RSP matter. We see tech’s trajectory and know certain issues demand vigilance—but must avoid false “wolf” alarms or declaring, “Innovation stops here.” We need AI delivering useful, innovative, delightful experiences—while defining clear constraints ensuring safety and convincing others they can succeed safely alongside us.
Dario Amodei:
Months after launching RSP, three top AI firms released similar frameworks. Interpretability research is another breakthrough area. We also partner with AI safety institutes—this holistic safety focus drives deep impact.
Jack Clark:
Yes—Frontier Red Teaming was mimicked almost instantly. Good—we want all labs testing high-risk vulnerabilities.
Daniela Amodei:
As Jack noted, customers deeply care about safety. They don’t want hallucinations or easily jailbroken models—they want useful, harmless AI. In client talks, we often hear: “We choose Claude because it’s safer.” This massively impacts markets—we offer trustworthy, reliable models, pressuring competitors.
Chris Olah:
Expanding Dario’s point: a notion claims the most moral act is “noble failure”—sacrificing other goals for safety, acting unrealistically to prove purity. But this is self-defeating.
First, it surrenders decision-making to those who don’t prioritize safety. Second, if you align incentives, place tough decisions where power supports right choices, and ground them in strongest evidence—you trigger Dario’s “upward race.” Not marginalizing safety advocates—but forcing others to follow.
Looking Ahead: The Future of AI
Jack Clark: What excites you about what’s next?
Chris Olah:
Many reasons to get excited about interpretability. Obvious one: safety. Another—emotionally resonant—is that neural networks are profoundly beautiful, holding untold beauty we haven’t yet seen. We treat them as black boxes—disinterested in internals—but probing reveals astonishing structure.
Like viewing biology: some think, “Evolution is boring—a simple process running long, yielding animals.” Yet every animal embodies incredible complexity and structure. Evolution is optimization—like training neural networks. Neural nets harbor entire “artificial biology” structures. Dig deep—and discover wonders.
We’re barely lifting the veil—it’s unbelievable, brimming with undiscovered treasures. We’re opening the door—and what follows will be spectacular. Sometimes I imagine walking into a bookstore in ten years, buying a textbook on neural network interpretability—or true “neural network biology”—filled with astonishing insights. I believe we’ll begin discovering these in the next decade—even years—a wild, awe-inspiring journey.
Jack Clark:
Years ago, if someone said, “Governments will create new agencies to test and evaluate AI systems—professionally and effectively,” you’d doubt it. But it’s happened. Governments built “new embassies” for this novel technology category—I’m excited to see where this goes. It signals national capacity for such societal transformation—not just corporate reliance—and I’m glad to participate.
Daniela Amodei:
I’m already excited—but imagining AI’s future human impact is electrifying. Even now, signs of Claude aiding vaccine development, cancer research, and biology are astonishing. What it does today is incredible—and looking three to five years ahead, imagining Claude solving fundamental human challenges—especially in health—thrills me. Recalling my international development days—if Claude then aided my inefficient work, it would’ve been miraculous.
Tom Brown:
Personally, I love using Claude at work—so lately, I’ve used it at home too. Biggest recent change? Code. Six months ago, I used Claude for zero programming tasks—our team rarely used it for coding either. Now it’s transformed. Last week, at a Y Combinator event, I asked: “How many use Claude for programming?” Nearly 95% raised hands. Almost the entire room—unlike four months ago.
Dario Amodei:
What excites me includes places where consensus seems settled—but is about to shatter—one being interpretability. I see it as key to guiding and securing AI—and revealing deep insights into intelligence optimization and human brain function. I’ve said Chris Olah will win the Nobel Prize in Medicine.
As a former neuroscientist, many unsolved mental illnesses—schizophrenia, mood disorders—I suspect relate to higher-level system failures. Human brains are too complex and inaccessible to study directly. Neural networks aren’t perfect analogs—but far more analyzable and interactive. Over time, they’ll become superior analog tools.
Another area: AI in biology. Biology is extremely complex—many remain skeptical for good reasons—but that skepticism is eroding. We’ve seen the Nobel Prize in Chemistry awarded to AlphaFold—a monumental achievement. We must build tools enabling hundreds of “AlphaFolds.”
Finally, using AI to strengthen democracy. We fear AI misdeveloped could enable authoritarianism. So, how do we make AI a tool for freedom and self-determination? This domain may mature faster than the others—but its importance equals theirs.
Jared Kaplan:
Two points echo your thoughts. First, many join Anthropic driven by immense curiosity about AI science. As AI advances, they recognize we must not only push tech—but understand it deeply and ensure its safety. Working with growing numbers sharing this vision is exciting—and recent technical advances accelerated this consensus.
Second, returning to practicalities—we’ve done extensive AI safety work. But recent developments gave us preliminary awareness of risks from highly advanced systems. This enables direct investigation—via interpretability and other safety mechanisms—into those risks.
This clarifies advanced AI risks—letting us advance our mission scientifically and empirically. So I’m thrilled about the next six months—leveraging insights into advanced systems’ pitfalls to research and avoid them.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News














