
Codex Head: "Everyone is a builder" is a terrible idea
TechFlow Selected TechFlow Selected

Codex Head: "Everyone is a builder" is a terrible idea
Do we still need PMs to build AI products? If so, what do PMs need to do in the AI era?
Author: Founder Park
Andrew Ambrosino is the head of the OpenAI Codex team. With a background in design, he has worked as an engineer and in product, and has also founded startups. Currently, the Codex product he is responsible for has over 5 million weekly active users. He is probably one of the most suitable people to answer "How to build products in the AI era".
In his view, when almost everyone in the company can quickly build a functional prototype, the real challenge is no longer "can it be built", but "should this be built".
In conversation with Lenny, Andrew Ambrosino detailed OpenAI's internal development process: when implementation costs are significantly compressed by AI, every link in product development, from project initiation, documentation, prototype, design to role division, team collaboration, and product planning, is changing. Which old rules are failing? Which new standards are forming? When implementation is no longer scarce, what is the truly scarce resource?
Some core points:
- When 90 people can make 90 product prototypes that look launch-ready, the most precious thing is taste.
- One hard standard for Codex team hiring is taste, the ability to distinguish signal from noise in massive content. In a world with "infinite tokens", they don't want to produce garbage content.
- If Codex had launched three months earlier it would have failed completely; the only variable is model improvement. Don't easily judge a feature as bad, it might just not be time yet.
- Whether a feature is ultimately good enough, the premise is often not the form of the feature itself, but whether the model is smart enough.
- Just as Codex once disrupted ChatGPT, Codex may also be disrupted by new attempts. Retain a bottom-up exploration culture; you cannot expect the same team to polish details and disrupt themselves.
Below is the essence of the conversation.
Implementation costs lowered, taste becomes more important
Lenny: You said AI is changing the form of product work. You are now working in possibly the world's most frontier AI product team. Specifically, how has the product team's working method changed compared to two years ago?
Andrew Ambrosino: Now as a team lead, the hardest thing is the process is reversed.
How to do product in the past, everyone is familiar: research first, come up with ideas, make prototype. Even though we passed the waterfall development era long ago, the underlying logic is the same, implementation is expensive. So before implementation, exclude all risks via docs, research, and prototype. Because prototype and design are cheaper than development, this was the past basic assumption.
Now this assumption has completely changed, anyone can make anything. I really believe, starting from zero talking to these models, whether our models or other companies, you can build any feature you want. This isn't necessarily the hardest part of software development, but indeed very cool.
You give people infinite tokens, everyone at OpenAI is very proactive, has good ideas. So everyone is doing all kinds of things. Now there's a feature the company urgently needs, I'm sure simultaneously there are 90 different, uncoordinated small teams each implementing and trying. In those 90 attempts, which are good? Which parts should be folded into other aspects? How to define it? Should it be part of another feature? How many options should be in the switch? It's these things.
So the short answer is: reversed. Not people doing fundamentally different roles, nor skills disappeared, roles don't exist. Implementation is no longer the most expensive part, I dare say, the most expensive is taste.
Lenny: So before everyone would write PRDs, strategy docs, now everyone makes prototypes directly. Many people in the company have similar ideas, then 90 different things appear, then select direction from them?
Andrew Ambrosino: Yes, this happens a lot. Not just at OpenAI, you already see many product leads say "PRD is dead, prototypes rule", but I actually completely disagree with this point.
Because implementation became cheap on every medium, skipping thinking and directly making prototypes becomes very tempting. Especially if you are not an engineer, if you never wrote code, have no interest or no time, you can't help but say: "PRD is dead, let me directly show you what I want."
But I also noticed a reverse phenomenon. For engineers, writing lots of docs instead becomes very tempting, lots of docs not worth reading. This isn't saying people writing docs are bad, but saying when implementation becomes abundant, choosing the right format to express your viewpoint becomes truly important.
If you want to express product clarity in a vague field, then maybe indeed should write docs. If you want people to get started trying, stress test an interaction pattern, then make prototype. Key is choosing the right medium.
Lenny: There's a concept called primal mark, the painter's first stroke on the canvas, everything after extends from that stroke. You mean, sometimes prototype is the wrong first stroke? Because people will anchor on it, instead of thinking bigger scheme?
Andrew Ambrosino: Yes. In the past there was an implicit signal, what something looks like, means what stage it is in the process. If you see something looks like an official launch product, that means it's already late stage, risks excluded, design looked, business goals reasonable.
Now these things are decoupled. The reason is in the past to get resources to build, you must first fully reduce risk, now this threshold is gone. So something originally just exploration stage, looks already launch-ready, visually it's ready. But it might not be the right direction, doesn't fit research conclusions, not what users truly need, nor optimal choice for business.
Not to overemphasize the taste thing. But say again, knowing what to do, how to present, how to achieve goals, what medium to use, this is becoming the most important ability in every field.
Lenny: Taste word now is a buzzword. Specifically, what is "good taste" you talk about?
Andrew Ambrosino: Taste must be broken down.
Indeed has an aesthetic part, but also a system thinking part, how this thing fits in the whole system; has a directional part, where we go, this thing is part of which theme; has an expression part, how to present this info; also part of taste is interaction level, this animation doesn't fit the semantics it wants to convey, it's too rushed, doesn't match the meaning it wants to express.
These indeed very important, but the real taste question is, if we can do anything, what is the goal? How to get there?
Lenny: When AI is stronger, does more things, where will human brain continue to have value? Taste feels like one of them. But AI design output still not quite there, why top models can't do design well?
Andrew Ambrosino: Some practical reasons, also some harder problems. Design is harder to score than software, creating a feedback loop to train model what is good design, much more tedious than training code whether it compiles, because human taste is part of the feedback mechanism.
Also, the lab historically prioritized letting models be good at things that accelerate AI research. Models writing correct code obviously accelerates research, design can't make the same argument. Not saying design not important, but it's not in that flywheel.
These are practical reasons, they will disappear. Models will become quite good at design, but there are some harder things.
First, design has cultural attributes. You remember last year every new website copying Linear design. If model every time outputs Linear website, that's not the challenge. Novelty importance in design, far higher than software engineering. Software engineering you wish model completely follows known patterns, but design needs certain randomness and novelty.
Second, is the interaction between visual design and code. If tomorrow company changes brand, the shallow way is update 263 components one by one. The deep way is understand these two things looking different, actually both belong to one list style, conveying the same interaction pattern. This abstraction layer, current tech still can't reach.
Lenny: Jenny Wen (Anthropic Claude Code design lead) said design process already dead, directly build is fine, what do you think?
Andrew Ambrosino: I might be consistent with Jenny on many things. Formal design process, I agree with her, it's dead. And I wasn't a fan of that process before AI.
Years ago I did a startup, there was an article called "Case Study Factory", talking about designers trained to follow a fixed process, and gradually value this process itself more than results. If something went through this process, would default get two conclusions: First, it will be good, process guarantees quality; Second, even no one uses, it's also good, because it went through the process.
User research, diverge, converge, framework is right, but always a bit academic. That process premise is implementation very expensive, you can only afford to build once, so must before doing exhaust all problem space and solution space.
Later Figma and Origami appeared, we pulled interaction prototypes into the process. Now the problem is, you can pull all implementation to the front of the process. A completely refined prototype, looks can directly launch. Enough people in the company see it then ask: "Can we launch now?" But actually, you still in early design exploration stage, just no one says it explicitly.
So saying design process dead, both right and wrong. If you bind to specific tools and specific daily operations, then it indeed dead. But "what stage of process we are in now" this cognition, more important than ever.
Binding design process to specific medium, that is the dangerous place. Designers now have more tools, you can put things directly into existing product, can do A/B testing. Many companies have product baby versions, baby Cursor, baby Codex, a greatly simplified codebase, can simulate all interactions of formal product. You can use it to vibe code, say "what if sidebar becomes like this? what if a panel pops up?" This is designer's new tool, but it needs to cooperate with old cognition: where are you in the process now.
Roles and positions merging, but PM won't disappear
Lenny: Many companies saying "roles extinction". Do you think PM, engineer, designer division of labor will completely disappear?
Andrew Ambrosino: Some companies like catching trends, go extreme. The danger of eliminating role concepts is, it might simultaneously eliminate "these fields are professions with learnable best practices" this cognition.
I hear many companies say "we want to cancel product roles", I think this is a very bad idea, then say "everyone is builder". The result is product management this already accumulated true best practices, truly stepped in pits discipline, directly abandoned. Because someone wrote a few lines code, think everything fine, that's not a good state.
I welcome "this is not your field, you can't touch" this boundary disappearance, but need balance. Not everyone can do everything, whether from breadth or depth, this is also why managers won't disappear.
And every discipline has skill component. Many engineers don't admit this, think engineering has skills, other job roles are all "vibe". Not like this, you can use Excel doesn't mean you can go work in finance team.
I think what happening now more is, people cross-role collaboration becomes easier, learning other fields best practices becomes easier, no need to bind your efficiency on a certain role with ability to use specific tools.
I spent long time feeling I shouldn't be software engineer, because I don't care about assembly language, also don't want to remember TypeScript type system. These roles always existed some thresholds, as if "doing this role well equals mastering this tool". I think this starting to dissolve, but people exaggerated it.
Lenny: Your Codex team indeed has more role fusion, specifically what kind?
Andrew Ambrosino: We in Codex team, indeed saw more role fusion than other teams in company and other industries. Partly reason is, this is a technical product facing engineers. So our designers speak engineer language, our product managers speak technical language, can write code.
We internally have a way to describe collaboration: nowadays overlap between roles much larger than past. Defining a person, no longer look at "where design ends, where engineering starts" such responsibility boundaries, but look at his all work content average distribution.
If you spread out all things one person in design team does, among them might contain large amount of writing code work, also contains large amount of product related work. But take an "average" of these works, he finally will still fall in some area more biased towards design.
Lenny: You mentioned product manager work more like zone defense, specifically what means?
Andrew Ambrosino: If two product managers cooperate too closely, usually not a good signal. You should more like force-directed layout spread team out to look, where exists gaps, where no one covered yet?
In today's world, curation, guidance and alignment became most important work. Everyone constantly throwing out ideas, whole environment full of noise, past that top-down, by annual making plan way already not workable. We need someone as taste gatekeeper, guide a thing from concept to product, and this means you must cover every corner of company.
So, you need to spread team out to look, who good at what? Let each other keep certain distance, ensure coverage sufficiently comprehensive. Then go fill gaps, for example: "We want to hire an engineer with strong product thinking." We don't hope appear this situation, a group people first write lots of code, finally still need whole product team to audit and calibrate product consistency. We hope everyone possesses these abilities, just each person deep dive direction needs to change.
Lenny: So now most valuable people, are those who can from idea to completion fully push, and have taste know "this is great" people?
Andrew Ambrosino: Yes, I think this is the core change now. This also reflects my understanding of IC and manager relationship. Not saying management will disappear, nor saying everyone just IC, but now everyone in some sense simultaneously undertaking these two roles.
If you are IC, you already no longer character by character typing code. You actually managing something, managing agents, managing those organized together to complete tasks work. If you are team manager, essentially doing same thing, just management granularity different.
I usually look for people, besides possessing solid professional ability, also must have taste. Because in a "world with infinite tokens", we cannot produce garbage content. You must possess ability to distinguish signal from noise in massive content.
Every time someone asks Codex team how many people, my answer is: "Roughly between 10 to several thousand people." Sounds like a joke, but actually, everyone's work will converge into this product, model research, browser use, model persona, frontend infrastructure, user experience, these all constitute part of product.
But at same time, we also not every day receiving several thousand people submitted PR (code commit requests). Team has double-digit scale engineers, designer quantity about half of engineers, plus few product managers, majority are IC. Team impact scope very large, but management layers not thick. Here many people once founded companies, also many in big companies with "founder mindset" doing things, also many people with excellent taste.
Whole Codex application is shaped by dogfooding loop. We all have a common wish, as much as possible complete our own work inside application, even if it temporarily not the best tool, because only like this, it finally has chance to become best tool. We often deliberately not improve certain internal processes, but let product itself become better, thereby able to support these processes. This actually a very uncomfortable state. But week by week, it indeed continuously changing.
Codex launching three months earlier would die, only difference is model improved
Lenny: Under things changing so fast rhythm, how do you do planning? Look how far?
Andrew Ambrosino: We in planning have no revolutionary methods. Basic principle is, things closer to present, planning needs to be more specific. Not saying don't do nine month plan, but that plan must keep very vague. Because any precision you add on nine month plan now, is false precision, only wastes time.
On app product side, what you can plan in November, to December might still be right, but by February completely not that matter.
At my previous company, when we started based on model capabilities to drive feature development, original product process basically collapsed. Later became list all interested directions, make prototypes for them, judge which now feasible, then put others temporarily aside. Whenever model capabilities appear new leap, take those shelved things out try again. Because a feature ultimately whether good enough, premise often not feature itself form, but model exactly how smart. Many people always dissatisfied with my planning way. But this thing indeed very hard.
Lenny: Any concrete example show timing how important?
Andrew Ambrosino: About Codex has a very good story. I very sure, that Codex application we released in February, if ready in November released, it would in market completely fail. Only difference is November to February between model improved. Same product, completely identical form, result completely different, just differed few months.
Lenny: So now not working things later might work? Keep bigger ambition?
Andrew Ambrosino: Yes. I recommend people don't easily determine "this thing now not working, so it's a bad feature", it might just not time yet.
Back to Codex initial release, Codex web. Basically you give model a task, it goes do, finishes come back give you result. Sounds not radical, but problem is it didn't do well enough, that form too ahead of its time.
Then Claude Code came out, completely localized, not even cloud, not pretend itself how AGI. It will ask you questions, will wait there, you can't delegate whole life to it. It much easier to use, because it matched at that time model capability level.
We were too "AGI" at that time, I often think this lesson. Past, a product in market failure, often can tell you many things about product form or communication way problems. Now different, you might need to publish same thing six times, until it succeeds, form might completely unchanged.
In-app browser also this situation. We once had a working version, back to Atlas period, we already had agents executing tasks in browser. Further back is Operator in ChatGPT, that didn't succeed. But if you string Operator, Atlas, Codex and ChatGPT together look, you will discover between them actually can draw a continuous evolution route. Essentially same function, just with intelligence level changes, being continuously re-released, and result therefore completely changed.
Once a product or feature already exists, people easy to put attention on various detail problems and micro-optimizations, and they indeed should do this. But this is also why we always retain a bottom-up exploration culture. Because sometimes, just like Codex application once in some way disrupted ChatGPT, Codex itself future may also be disrupted by new attempts. You can't expect same team both continuously produce disruptive innovation, and always focus on product quality and detail polishing. At some stage, you must design a mechanism, let these two abilities can exist simultaneously.
Lenny: Codex vision what is? You want to take it where?
Andrew Ambrosino: This year January and February, we in internal self-use testing process discovered engineering and research workflows formed very clear PMF. But at same time, market, communication, finance, legal people also all using Codex, even if this application for them not friendly, it will show them code, let them approve command line search tool execution.
We tried adding Codex capabilities to other products, ChatGPT desktop app, Atlas browser. Result most annoying thing happened, no one willing to leave Codex application, to use those supposedly specially built for them products.
This gave us inspiration is, developer tools and general knowledge work tools between, actually exist many subtle commonalities. We indeed believe, we are building this product form, is carrying various deep vertical scenarios correct form. Start from simple, then according to need gradually add complexity.
It not that kind "on screen draw a rectangle, then all things must inside complete" product. More like a base camp, you here start work, end work, manage automation workflows, and it will call everything you need tools. Someone called this form "super app", I really wish they at that time didn't call this, because now I almost every day have to hear this word.
Dan Shipper has a very interesting idea, he thinks future we will inside Codex "from inside out" use SaaS tools, Notion, Linear, Salesforce not are you go to browser open, but agent in Codex helps you operate. We also indeed doing these, in-app browser, Chrome extension, computer use, all these are letting Codex can interact with external tools ways.
One best example, our internal video producer Brent used Codex to edit Codex release video. Codex not video editor, doesn't have those UI. But it understands Brent is using Premiere Pro, can through editing Premiere Pro behind files do some modifications. When it discovers can't do all things, it itself wrote a Premiere Pro extension plugin, install into Premiere Pro, then through this plugin talk with Premiere Pro. Seeing this time we all shocked.
This is a very good pattern, have professional tools do professional things. Codex doesn't need to become better video editor, it can seamlessly interact with professional tools is fine.
Writing code not important anymore, deleting code is important
Lenny: From hand-writing code to AI write 100% code, to now agents and loops. Frontier teams now exactly how work?
Andrew Ambrosino: Loops? That was last week's thing.
We always discussing "product has how much proportion is AI written code" this question. Using last year's standard to look, we now 100% code is all AI written. So question no longer "how much is AI written", but "code is under supervision written or unsupervised written", this is a completely different dimension. I welcome this standard continuously refreshed, because this means we are making product progress.
We in autonomous development software aspect did lots exploration, like lots of harness engineering, like "if model at night automatically do codebase garbage collection and cleanup?"
But currently all models have a problem, they always increasing complexity. If doing research people listening: please, let models learn delete code. When you try put development completely hand over to autonomous driving, this became a serious problem, whether at human level or codebase level.
Feature requests also same. How should you teach model judge which features worth doing, which should be ignored, which should be merged then redefined? And how teach model build correct abstractions?
These abilities all continuously progressing. But I don't think we already developed to such a stage, set a loop, let model automatically "improve application", continuously listen Twitter, Slack and email, then autonomously complete iteration. Although, we indeed trying to turn this thing into reality.
Lenny: You think we will reach that step? Just set a goal: "Win"?
Andrew Ambrosino: "/goal make me ten billion dollars." I don't know. I won't say never or forever will how.
Lenny: You as product and engineering lead, yourself how use AI work?
Andrew Ambrosino: I think I now possibly possess world's best job.
Initially doing Codex, my personal goal was let it good to I can use it to write Codex code. That was a super tight self-use product loop, I can't do certain thing, then fix it, then I can do, then can do more things.
Later my role changed. I need do more product discovery, figure out team doing what, correct things deviating direction. So Codex became I do these things tool. "Help me build a spreadsheet organize these data out." "Help me do an internal deep research, look at past in this direction done what explorations."
May series of launches, in-app browser, computer use, artifact creation, that was we first time use vibe coordination manage launch. I have a Notion document record all todos, then use Codex automatically go to PR, Slack channels collect progress, update status tracker, at that time felt this is managing product launch way most frontier.
Now I every morning get up, will look Codex helped me generated daily report, from my joined 3000 Slack channels filter out need me pay attention things. I can reply say "give me five questions, I come answer". It will self-adjust, I say "next time run, less focus this workflow" or "this thing happened but didn't appear in daily report, ensure later can catch", it will update notification way, adjust focus priorities.
Lenny: This how set? Workflow what is?
Andrew Ambrosino: Now still in discovery stage. Just make a scheduled task: "go through my Slack channels, these are things I care about, according to these categories organize, here some context." First few times run might need guidance. Good thing is I don't need to go find how edit instructions, I directly dialogue say "next time help me change this", it just updated.
But I think this also chatbot form core problem, I know how set, because for me this is product discovery. But if you not in OpenAI work, not in developing this thing, you won't want to go figure out these. We need think clear how let these for ordinary people also usable forms.
Lenny: I myself also use Codex made a filter spam automation. One step needs go to Google Cloud console set a bunch API triggers, that interface especially annoying. I just let Codex help me do, it directly took over my computer, use computer use function operate.
Andrew Ambrosino: It just: "I don't care whether you have connectors, bro, I directly start clicking."
How divide connectors, in-app browser, Chrome extension and computer use between boundaries, is a very interesting thing. Many times, these boundaries actually all by feel groped out.
I think these personal workflows especially interesting. Everyone all trying all kinds of things, everyone will build out completely different systems. But slowly, some common patterns will emerge. Then we will realize: "This should become product inside first-class experience."
For example memory, many people setting up Obsidian knowledge base or Notion space to build their own mind palace. You shouldn't need do this yourself, should have a sufficiently generic memory function replace you do. We always filtering, what for individual effective but should stay at individual level, what should enter product become base component.
Lenny: Outside people see all you winning. But definitely have things didn't succeed times?
Andrew Ambrosino: Listen you describe quite funny. This actually my first time feel myself not failing.
I before startup did many years, finally basically was take company apart sold. In highly regulated industry do, whole process like continuous failure. Later went to another startup, in another closed regulated industry do AI tools, also one time after another not working. I actually failed a lot. Sometimes just a time point, skills, passion and market window exactly aligned.
Even now this put Codex and ChatGPT combined project inside, also countless small failures. We say "should look like this", send to Slack, below is 2000 messages say how stupid we are. This is I like OpenAI place, people will directly tell us, to internal product failure no mercy, this is also why external products done not bad. I before arriving now this position, failed about 10 to 15 years. So I every day still feel a bit surprised, things actually smoothly proceeding.
Lenny: To readers any final suggestions?
Andrew Ambrosino: Don't with your current workflow "bind for life". Truly should persist, is those only you can uniquely deliver results. Then, continuously try change your process. If your most proud skill is "I most understand Figma auto layout", then you doing what? AI will also become better than you. Find worth doing things, then think ways to do those things.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News










