
Interview with Mercor Founder: AI Will Soon Dominate the Talent Assessment Process
TechFlow Selected TechFlow Selected

Interview with Mercor Founder: AI Will Soon Dominate the Talent Assessment Process
Humans are more involved in the "sales"环节.
Author: MD
Publisher: Bright Company

Recently, Jacob Effron (center) and Patrick Achase (left), partners at U.S. venture capital firm Redpoint Ventures, sat down with Brendan Foody (right), founder and CEO of AI recruitment platform Mercor, on the podcast "Unsupervised Learning." Beyond discussing changes in Mercor’s core business of AI-powered hiring, the trio explored the evolving relationship between AI and humans in the future of work.
Mercor was founded in 2023 by three 21-year-old Thiel Fellows, including Brendan Foody. In February this year, the company announced a $100 million Series B round at a $2 billion valuation. The round was led by Felicis, with participation from Benchmark, General Catalyst, and DST Global. Mercor uses AI to automate resume screening, candidate matching, AI interviews, and compensation management, aiming to boost hiring efficiency and reduce human bias.
In the conversation, Brendan Foody noted that Mercor has now entered the field of AI model evaluation and data labeling. As AI models grow more capable, many complex questions can no longer be verified through models alone or common sense. Model developers increasingly need highly knowledgeable experts from specialized domains. However, such roles are often short-term and non-traditional—similar to “expert networks”—making it natural for platforms like Mercor to connect top AI labs with niche talent. Foody emphasized, “The data labeling market is shifting from large-scale, low-barrier crowdsourcing to high-quality, expert-driven annotation.”
Regarding its core business—AI recruitment—Foody believes AI has already reached or surpassed human performance in text-based talent assessment, especially in resume screening and textual interview analysis. However, AI still lags in multimodal tasks such as judging emotion and interpersonal “vibe.”
Foody also shared a key insight: As future hiring and talent evaluation become increasingly dependent on rich contextual data, the completeness of feedback mechanisms and data inputs will directly impact model effectiveness. For example, if you’re hiring an investor, feeding the model their podcast appearances, meeting notes, and other behavioral traces creates context that enables far better judgment of their cognitive style, capabilities, and job preferences. In traditional hiring, such data is either ignored or too costly to evaluate—while AI makes this process cheaper and more efficient.
Therefore, the division of labor between AI and humans may evolve so that AI quickly dominates the assessment phase—improving both speed and accuracy—while humans focus more on the “selling” side: communicating team culture, motivation, and engagement to enhance candidate experience.
“The trend I see,” said Brendan Foody, “is that humans will focus on creating evaluations to teach models how to do things they currently can’t, rather than repeating the same tasks over and over again.”
Below is Bright Company’s edited translation of the full interview:
Jacob: Brendan Foody is co-founder and CEO of Mercor, a company building infrastructure for AI-native labor markets. The Mercor platform is used for data labeling, talent screening, performance prediction, and evaluating both human and AI candidates. This is a fascinating company positioned at the intersection of hiring assessment and improving foundational models.
Brendan’s team recently raised $100 million and is working with some of the most advanced AI companies today. Our conversation today touches on many interesting topics, including the future role of humans in the workforce. We discuss which types of data labeling are most valuable for model improvement, Brendan reflects on Mercor’s rapid rise and key decisions, and we dive into where AI works—and doesn’t work—in the hiring process. It was a truly engaging discussion, and I think you’ll enjoy it. Brendan Foody, thanks for joining our podcast.
Brendan: Thank you so much for having me. I’m a huge fan—really excited to be here.
Jacob: Great to have you. Let’s start broad—for our listeners, could you walk us through where we are today? What’s the current state of AI in talent assessment? What works, what doesn’t, and how far along are we?
Brendan: I’m actually surprised by how well it performs. I think for anything humans can assess via text—interview transcripts, written evaluations, signals on a résumé—models are nearly at or beyond human-level performance. It’s an interesting dichotomy because these technologies are still underutilized across the economy. That gap represents a massive opportunity—one we’re very excited to build into.
Jacob: Were there things that simply didn’t work before reasoning models emerged? With models getting stronger over the past six months, what finally started working?
Brendan: Yes, when GPT-4 launched, we built our first prototype AI interviewer—and nothing worked. The model would hallucinate every two or three questions. But progress since then has been incredible. The arrival of reasoning models clearly elevated model capabilities, especially in handling large contexts, identifying key points, and maintaining focus.
That said, models are still weak on multimodal tasks, partly because labs haven’t prioritized them, and reinforcement learning for such tasks is harder. Still, we’re excited about progress here.
Jacob: What milestones are you most excited for models to achieve?
Brendan: Things like judging “vibe”—whether I’d want to work with someone, whether they’re passionate or genuine. These are hard even for top humans, let alone models. I’m eager for breakthroughs here and am building tools to measure them. Yet whenever I read a model’s reasoning chain and compare it to our internal evaluators, I find the model is far more rational than our own researchers who design the assessments.
Models are advancing incredibly fast—you see it in coding—but we’re only just beginning. Many other domains are taking off at astonishing speed.
Jacob: A big part of your work involves designing human evaluations to assess job readiness. Now that many people are building AI agents to perform employee tasks, are you involved in that space?
Brendan: Absolutely—we’ve done a lot here. To give some background: we started Mercor because we saw immense global talent being overlooked, largely due to fragmented labor markets. Remote candidates apply to few jobs; San Francisco firms consider only a narrow pool because they manually solve matching problems. By applying large models, we can fix this—building a unified global labor market where every candidate can apply and every company can hire. But we later realized that with new knowledge-intensive roles emerging, demand for human talent to evaluate large models has surged. So now we recruit experts for top AI labs. These labs use our tech not only to evaluate experts but also to assess models and the AI agents you mentioned.
Patrick: For our audience, Mercor also uses AI in candidate screening, resume processing, etc. Can you walk us through your AI use cases and tech stack?
Brendan: A good approach is to replicate everything humans do manually, create evaluations around it, and see if we can automate. How do humans review resumes, conduct interviews, rank candidates, decide hires? We automate all of it—evaluating the accuracy of resume parsing, scoring sections of a resume, asking interview questions correctly, assessing responses accurately—then feed all that into model context, along with reference letters and other data, to make hiring predictions.
Patrick: Do you mainly use off-the-shelf models, focusing on evaluation and context design?
Brendan: Yes—we use existing models for base tasks, but for final candidate assessment—the hardest part—we do post-training. We learn from client data: who performed well, why, and what signals matter. From that, we improve future hiring predictions.
Patrick: Have you discovered any surprising signals? Things AI found that humans missed?
Brendan: Many examples. A key AI advantage is deeper analysis of fine-grained details—small signals humans overlook due to “vibe judgments.” For instance, if someone shows deep interest in a field purely out of passion—not job requirements—that becomes a signal. Or if someone studied abroad in the target country, they might communicate more smoothly and fit better culturally. These nuances vary by project and client.
Patrick: Are there things that will always require humans? You mentioned multimodal tasks—how do you see collaboration between AI and human interviewers evolving? Will everything eventually be AI-evaluated?
Brendan: Simply put, hiring splits into assessment and selling. The assessment phase will soon become extremely strong—people will notice AI recommendations are significantly more accurate and begin trusting them more. Humans will continue playing a major role in the selling phase: helping candidates understand the team, role, and culture. AI allows hiring managers and HR to focus only on top-fit candidates, avoiding wasted time on mismatched ones. This lets them better guide candidates on motivation, team dynamics, and incentives.
Patrick: Will people start “gaming the system”—trying to optimize for evaluation signals? Have you seen this? Like everyone claiming they studied in the target country?
Jacob: Everyone claims they studied in the target country.
Patrick: Right—like saying they studied locally.
Brendan: Yes, so sometimes we keep signals confidential. Like all major hiring processes, we face this often. The key is keeping evaluations dynamic—rotating questions or asking deeply personalized follow-ups based on the candidate’s background. Because models enable unprecedented depth and breadth in talent assessment.
For example, my first executive interview might rely on a few minutes of LinkedIn and notes. But if I can listen to their podcasts, read their blogs or papers, and ask targeted questions, the depth and specificity are completely different.
Jacob: Your models are great at predicting candidate performance—does this process need explainability, or is a black-box output sufficient?
Brendan: I believe explainability matters for two reasons. First, clients need to understand and trust model conclusions—building confidence through transparent reasoning chains. Second, we must ensure models make decisions for the right reasons. So explainability adds real value.
But I think the ultimate economic form may just be API-style: people need work done—or limited human input—and just need a confidence interval predicting whether someone can succeed. In that flow, human intermediation drops dramatically.
Jacob: That’s a trust milestone toward that goal—makes sense. Data labeling today has clear feedback loops—e.g., multiple people label the same data. What are the challenges in applying this to fuzzier human work, where feedback might take 15 years?
Patrick: Like VC (laughs).
Brendan: My view: if 100 people do the same job, ranking them is easy. But if 100 people do entirely different jobs—like founders—each with unique outputs—it’s hard to find common patterns or link behaviors to outcomes. Too many variables. For homogeneous roles—say, hiring 20 account managers—models can learn and optimize signals. But for complex roles, like evaluating a group of Thiel Fellows, it’s far harder and relies more on model reasoning.
Jacob: What specific challenges arise?
Brendan: Mainly, a lot of information never enters the model’s context. People forget to add details. For example, a friend tells me a company’s product is amazing—that info isn’t in the model. Ensuring all reference letters and interpersonal nuances get fed in is the main challenge. We’ve found that simply inputting the necessary data into model context solves most issues.
Jacob: Maybe one day everyone’s smart glasses will record everything, feeding it to models in real time.
Brendan: Exactly.
Jacob: Could it go as far as Bridgewater?
Brendan: Possibly. But many companies will resist—due to legal, compliance, or cultural concerns. Still, better processes will emerge. For example, AI conducting exit interviews, talking to managers and teammates to extract deeper insights. People hold vast amounts of detail in their heads—we just need to get that into models to make superhuman predictions.
Patrick: More founders and professionals now bring AI to meetings, so many conversations get recorded and used for training. Fascinating.
Jacob: We could run our meeting transcripts through AI to rank ourselves.
Patrick: Haha!
Jacob: As long as I stay at the top.
Patrick: What’s your take on today’s data labeling market? How do players differentiate? ScaleAI seems dominant, but many new entrants are emerging—what’s your view on the landscape?
Brendan: Most people don’t realize how fundamentally the data labeling and evaluation market has changed. It’s nothing like two years ago. Back then, models were weak, easily fooled—high school or college students could do labeling via crowdsourcing for SFT (Supervised Fine-Tuning), RHF (Reinforcement Learning from Human Feedback), choosing preference options.
But as models got smarter, crowdsourcing failed. You now need high-caliber talent working directly with researchers to understand why models succeed or fail, designing complex data to challenge models and reflect real-world automation challenges. Our platform excels at quickly sourcing these high-quality experts.
This fueled our growth and partnerships with top labs. I expect this trend to continue. Companies stuck on mass crowdsourcing will struggle; new players focused on quality talent will capture share.
Patrick: Do you think human involvement in data labeling will always be needed? As models grow stronger—even training smaller models themselves—how do you see this evolving?
Brendan: As long as there are tasks humans can do that models cannot, we’ll need to create or simulate environments for models to learn. Some domains will fall quickly—math, code—where data is small and verifiable, so models solve them fast. But others are open-ended—like evaluating great founders or many knowledge jobs—where it’s hard to define “good,” requiring human understanding to be encoded into models. That’s why I expect human data (note: human-generated or human-centric data) and evaluation markets to grow exponentially.
Jacob: If I understand correctly, your original “arbitrage point” and company inspiration was that brilliant programmers exist globally but lack access to certain jobs—making programming data crucial. You’ve clearly expanded beyond that. Programming is a perfect RL and evaluation use case—what had to change or improve when entering fuzzier domains and recruiting for related roles?
Brendan: I think leveraging human heuristics is powerful. For example, to automate a consultant’s job, how do you evaluate them? Give them case studies—perhaps relevant to their background.
Jacob: Your team likely excels at evaluating programmers. But if you bring doctors onto the platform, how do you know what heuristics to use for evaluating doctors?
Brendan: That’s a great point—when entering domains beyond ML teams’ expertise, you need experts. We need doctors to help design evaluation frameworks for doctors—same for other fields. And researchers face the same issue: grading high school physics is easy, but without a PhD in chemistry, a researcher can’t judge advanced answers or improve evaluations. So this is another shift you asked about—whether evaluating talent or models, it will become more collaborative, requiring expert partnerships to advance models.
Jacob: I recall you saying these short-term data labeling contracts were the perfect entry point—huge demand, a wedge into end-to-end labor markets. Can you walk us through your path and stage goals toward that vision?
Brendan: I wrote a “secret master plan” on this. My view: marketplaces have strong network effects—this creates moats but also makes them hard to build. So we’re laser-focused on capturing massive demand and expanding network effects.
At the same time, we see big tech clients needing hundreds of contractors—data scientists, software engineers—roles not directly tied to human data, but with similar underlying needs, just in more traditional markets, previously served by Accenture or Deloitte. We’re making this our second priority, then expanding into full-time hiring. Ironically, we started by helping friends hire contractors—many of whom later converted to full-time.
So these businesses are continuous, sharing core needs: more candidates, faster hiring, higher confidence in fit. By continuously measuring and improving these metrics, we serve companies at every stage.
Jacob: Was there a moment when you realized the human data opportunity was huge—clear enough to pivot?
Brendan: Yes, during college. Background: my co-founders and I met in high school at 14, started companies at 18. They won many competitions—I wasn’t as skilled—but kept building. Later, we began hiring international talent from India, partnering with IIT Code Club, finding brilliant people unable to land jobs. We hired them for projects, and friends paid us to recruit—we earned small service fees, grew to $1M revenue, and made $80K profit after salaries.
I was proud, but my parents weren’t impressed—only when we raised funding did they approve. Back to your question: in August 2023, a client introduced us to x.ai’s co-founder, then still in Tesla’s office. He said Mercor had super-engineers in India, strong in math and coding. The next day, x.ai’s founder called us, thrilled. Two days later, we were in Tesla’s office, meeting nearly the entire x.ai founding team—except Elon—right before their meeting with him. We were still in college—mind-blowing. We kept asking: why are they so excited about our product? Because the market shifted fast—no one noticed. Now we’ve scaled and taken key market share, so we can talk about it. But they weren’t ready for human data yet—about six months later, we partnered with frontier labs and scaled the business.
Jacob: You saw the wave coming.
Brendan: Yes, I find many founders force PMF too hard. You should watch market signals—go where the gold is. If initial sales are tough, scaling will be harder. Find the most painful, highest-paying customers—those willing to pay anything to solve the problem—and go all in.
Jacob: You’ve moved beyond programming. The doctor example makes me think: eventually, the standard for evaluating good doctors will be used by model companies to train models—assessing medical reasoning. What exactly do you do with clients?
Brendan: One area humans still beat AI is continuous learning and improvement. We look for proxy signals: candidates who ask the right questions, think correctly, have experience in high-performance environments—these help spot model flaws and boost capability.
Jacob: Do you use your own product internally? How does it work in practice?
Brendan: Of course—except for executive roles, we use it for everything. We post exec roles too, but I usually interview first—mainly to sell the role, not screen. Our AI interviews are highly effective—often the strongest predictive signal. Many underestimate “vibe judgment” bias in hiring—people always think they’re right.
Jacob: Hiring is the original “vibe” industry.
Patrick: VCs definitely don’t have that bias.
Brendan: So we use performance data to decide. For example, when hiring a strategy lead, we used to do human case studies. Now we use AI interviews exclusively—and conversion rates improved. AI makes comparisons more objective and standardized, eliminating inconsistent human judgment.
Patrick: For your own evaluation process—do you use internal people or the marketplace? Mostly internal?
Brendan: We use the marketplace for our own evaluations—same as clients. Researchers still participate—analyzing model errors, refining error taxonomies, optimizing post-training data. The process and staffing are identical.
Jacob: You mentioned using multimodal abilities to judge passion—any plans for video/audio?
Brendan: I often think about RL’s role in boosting video understanding. RL excels at search, and video is dense—so models struggle. We need to identify key signals in multimodal context: Is the candidate excited? Cheating? We must generate the right data to train models on these signals—frontier labs are advancing core capabilities.
Jacob: As you said, the labeling market has changed drastically in just years. Where will it be in two years? Will this business still exist—or just experts?
Brendan: I think it’ll remain important. Our mission was always to aggregate labor for more efficient allocation. The key is anticipating humans’ economic role five years out.
The trend I see: humans will focus on creating evaluations to teach models new skills, not repeat old tasks. I’m bullish on knowledge work transitioning to evaluation—formats may become more dynamic, like solving problems in dialogue with an AI interviewer. I believe this will be a major economic sector—though most don’t realize it yet, confusing it with SFT/RHF markets, whose value is declining and budgets shrinking.
Patrick: What skills should people cultivate? If advising students, what would you say?
Brendan: I’d emphasize rapid learning ability—change is too fast. Many assumed models would take decades to master certain fields—breakthroughs come sooner. Learn to collaborate with AI. People on our platform love working with models daily—thinking about what models miss or lack. This helps them identify where AI boosts efficiency in real jobs. Use models constantly, understand their domain-specific strengths and weaknesses. It’s helpful—but hard to prescribe specific roles like software engineer.
Jacob: Interesting—maybe we’ll all spend time training models. Hard skills have right/wrong answers, but subjective areas are nearly infinite. Maybe one day we’ll earn money working for our personal AI models.
Brendan: Totally agree. I’d also advise focusing on high-demand elasticity fields. Software development, for example, has 100x–1000x latent demand—not just 1000 new web apps, but endless feature iterations, algorithm optimizations. In contrast, accounting demand is fixed. Aim for areas where demand will surge and total productivity rises—much safer.
Patrick: Spot on. I spoke with a founder recently who said everyone talks about software engineers being replaced, but honestly—he desperately needs more.
Brendan: I’m excited too. If we boost software engineers’ productivity tenfold, we might hire even more. The relationship between demand and price is always fascinating.
Jacob: At the start, were you tempted to build hiring collaboration tools or software for agencies? Why choose end-to-end service? Was this decided early?
Brendan: Early on, we applied first-principles thinking—which helped, since we didn’t know traditional approaches. We knew our friends just wanted to find reliable software engineers, so we owned the entire process. Looking back, I think more companies will go end-to-end—it’s pointless building collaboration tools for jobs that may vanish. Better to automate the whole workflow, letting it learn and optimize from feedback.
Jacob: True—especially in your data labor market, going end-to-end makes sense while AI isn’t mature. Without that, you might’ve started with collaboration tools.
Brendan: Yes—for full-time hiring, clients want employees on their payroll. We were lucky—our operating model aligned perfectly with shifting market needs.
Jacob: You started by helping friends hire contractors. Did you initially see this as a side hustle? When did you commit fully?
Brendan: I’ve been building since high school—ran a solid business—so I didn’t want to go to college. Told my parents—they weren’t happy—so I applied to appease them, but kept saying I’d drop out. They didn’t believe me—thought once I enrolled, I’d stay. But each semester I said the same—until I finally did, no warning—because I’d been saying it for two years.
Patrick: I knew you’d drop out.
Brendan: For me, I always knew I wanted to build impactful things, not sit through useless classes. I was searching for something worth all-in commitment. My co-founders initially treated it as a side project—to gather proof to convince their parents to drop out. Their parents demanded seed funding success—even with $1M revenue and profit wasn’t enough. So parents are VCs’ true LPs—funding equals credibility.
Jacob: Right—no parents, no VCs.
Brendan: It’s “authority validation.”
Patrick: Speaking of funding—congrats on your $100M Series B earlier this year! How will you use it? How do you decide when to raise?
Brendan: We only actively raised for the seed round—to convince our parents to drop out. A and B rounds were抢投—completely inbound. Our philosophy: keep dilution around 5%, build a “war chest” for R&D—product incentives, innovative consumer products, supply-side expansion, and more post-training data to boost model prediction accuracy. Our biggest ML bottleneck is running more evaluations and training environments—which aligns perfectly with our core business.
Jacob: Many of your clients are foundational model companies. What’s your view on the future of this space? Some say only 2–3 giants will remain—how many players do you expect? How will they differentiate?
Brendan: Great question. I firmly believe OpenAI is and will remain a product company, not an API company. Many API capabilities will commoditize—the edge lies in deep integration with customer workflows, which creates pricing power. But the market is large enough for many to capture significant value in niches. Even a lab focused solely on hedge funds could generate massive profits. People dismiss valuations as inflated, but from a first-principles view of “automating knowledge work,” these top teams can build extraordinary companies.
Jacob: Models generalize well across domains—so is it winner-take-all, or will niche leaders emerge? Your hedge fund example suggests room at the application layer.
Brendan: Focus creates value. Building generic APIs isn’t a good business—only one player will survive. Most value will be at the application layer, where each vertical and use case demands deep customization.
Jacob: Will these custom models require complex labeling?
Brendan: Definitely. For example, each trading firm can build evaluations around their unique analysis—judging which conclusions are accurate, which aren’t, and whether they drive profit. A world-class post-training team optimizing trading logic faster than human traders? The opportunity is enormous.
Jacob: Feels like some trading firms’ optimal move is to pause trading for nine months and focus entirely on post-training.
Brendan: I’m surprised how little many trading firms invest in post-training—possibly geographic: they’re in New York, but labs and top researchers are in San Francisco, and elite researchers prefer doing AI over just making money. But I believe they’ll invest heavily—forming nine- or ten-figure partnerships with frontier labs to customize applications.
Jacob: What’s your biggest unknown in AI today? How would knowing the answer impact your company?
Brendan: It’s what you just asked: what will humans do in five or ten years? It’s an incredibly hard question—and central to our mission. We have intuitions, but the world moves too fast. Many jobs will be automated—we need a clearer picture of new human opportunities and economic roles. It’s critical.
Jacob: What should policymakers do? What role should institutions play?
Brendan: Regulators often focus on distant risks. I think within two to three years, people will genuinely worry that AI outperforms humans in most jobs—and we must figure out how to keep humans economically engaged. This isn’t a low-probability, high-impact risk. It’s inevitable. So regulators should proactively plan, manage expectations, and tell the public what the world will look like in a few years.
Jacob: True—even retraining programs aren’t well defined yet.
Brendan: Exactly. I hope for more discussion on the future of work, and clearer guidance for students and job seekers.
Jacob: We like ending interviews with quick-fire questions—broad ones, short answers. What’s overhyped and what’s underhyped in AI?
Brendan: Great question. I think evaluations (“e-vals”) are severely underhyped. They’re trendy now, but still vastly underrated.
Jacob: Humanity’s last fortress.
Brendan: I think SFT and RHF-type traditional data are overhyped. Companies have spent billions on this—unnecessarily. Spending should drop an order of magnitude. That trend will shift.
Patrick: Any views on AI that changed in the past year?
Brendan: Interesting. I’ve dramatically accelerated my timeline for automating software engineering. I used to doubt researchers’ timelines for “AI writing PRs with higher merge rates than humans.” Now I think it’ll happen late this year or early next—very exciting.
Jacob: Yeah. Two years ago, if you predicted today’s AI capabilities, people thought it would change everything—but once achieved, it feels less shocking. Do you think this will cause massive shifts in software engineering jobs, or just 10–20% change?
Brendan: Again, it comes back to “demand elasticity.” Short-term, I’m not worried about engineer unemployment—tools make them more productive, so more software gets built. But job nature will shift—those who understand product and model limitations will have a competitive edge.
Patrick: Besides your own, which AI startup do you most admire?
Brendan: I’m very bullish on OpenAI’s coding ability—though that’s not a contrarian take. I also believe there will be many custom agents—one stealth company in France really interests me.
Jacob: Then you can’t say it on the podcast—let’s pressure you after recording (laughs).
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News












