Gemini 3 Launches Late Night: Outshines GPT-5.1, Ushering in Google's Era of Large Models

2025.11.19

Gemini 3 Launches Late Night: Outshines GPT-5.1, Ushering in Google's Era of Large Models

Google defines it as "an important step toward AGI," emphasizing that it is currently the world's most capable multimodal understanding agent with the deepest level of interaction.

2025.11.19 - 01:34:21

GoogleAI

Navigating Web3 tides with focused insights

Google defines it as "an important step toward AGI," emphasizing that it is currently the world's most capable multimodal understanding agent with the deepest level of interaction.

Twitter crashed before Gemini 3 even appeared.

No model launch has been more anticipated than Gemini 3. Given Gemini's previous pattern of updating every three months, the AI community has been eagerly awaiting Gemini 3 since September.

Today, a single-word tweet—"Gemini"—from Google's head of developer relations and lead of Google AI Studio, finally ignited the long-brewing anticipation, sending Twitter into a frenzy over related topics.

Interestingly, as the release approached, Twitter "conveniently" crashed several times. Although Cloudflare was behind the outages, the timing was so precise it raised suspicions of foul play (quiet whisper: after all, Twitter is the main promotional battleground for every model).

One wonders what Musk thought this morning after just releasing Grok 4.1—either way, memes from netizens have already flooded the internet.

Just now, Gemini 3 has officially arrived. Let’s see how strong it really is under such intense spotlight.

Most Intelligent Model

As it turns out, Google did not disappoint those who waited. With the official launch of Gemini 3, SOTA has once again been redefined—and both Altman and Musk sent congratulatory messages.

Google describes it as "a major step toward AGI," emphasizing that it is currently the world's most capable multimodal understanding agent with the deepest interactivity.

Gemini 3 not only sets new SOTA benchmarks in fundamental reasoning but also aims to reshape developer ecosystems and AI assistance experiences through the launch of the new Google Antigravity platform and Deep Think mode.

The Ultimate Reasoning Beast

Gemini 3 Pro is officially dubbed the "most advanced reasoning model," significantly outperforming its predecessor Gemini 2.5 Pro across nearly all mainstream AI benchmarks, while comprehensively surpassing key competitors like Claude Sonnet 4.5 and GPT-5.1.

Gemini 3 Pro tops the LMArena Leaderboard with a breakthrough score of 1501 Elo, achieving the highest scores on Humanity’s Last Exam (37.5% without using any tools) and GPQA Diamond (91.9%), demonstrating PhD-level reasoning ability. It also sets a new frontier standard in mathematics, reaching a record SOTA of 23.4% on MathArena Apex.

Beyond text and logic, Gemini 3 Pro redefines the upper limits of multimodal reasoning. It scored 81% on MMMU-Pro and 87.6% on Video-MMMU, meaning it handles complex scientific charts and dynamic video streams with ease.

Notably, it achieved 72.1% on SimpleQA Verified, showing significant progress in factual accuracy—it’s not just powerful, but reliable.

A Thought Partner That Refuses Flattery

The evolution of Gemini 3 Pro isn’t just about benchmark scores—it’s about the quality of interaction. It sheds the clichés and excessive flattery typical of earlier AIs, becoming smarter, concise, and direct: telling you what you need to hear, not just what you want to hear.

It acts as a true thinking partner, offering new ways to understand information and express yourself—from translating obscure scientific concepts via high-fidelity visualizations to creative brainstorming.

Gemini 3 Deep Think

Gemini 3 Deep Think mode further expands the boundaries of intelligence, bringing significant advancements in reasoning and multimodal understanding to help you solve more complex problems.

In testing, Gemini 3 Deep Think outperformed the already impressive Gemini 3 Pro on Humanity's Last Exam (scoring 41.0% without tools) and GPQA Diamond (93.8%). Additionally, it achieved an unprecedented 45.1% on ARC-AGI-2 (code execution, verified by ARC Prize), showcasing its ability to tackle entirely novel challenges.

Gemini 3 Deep Think mode excels in some of the most challenging AI benchmarks.

Learn, Build, and Plan

Learn Anything

From the start, Gemini was designed to seamlessly integrate multimodal information across any topic—including text, images, video, audio, and code. Gemini 3 pushes the boundaries of multimodal reasoning further by combining advanced reasoning, visual and spatial understanding, leading multilingual performance, and a million-token context window, helping you learn in the way that suits you best.

For example, if you want to learn how to cook a family recipe, Gemini 3 can interpret and translate handwritten recipes in different languages, generating shareable versions for your relatives.

Or, if you're learning a new subject, simply provide academic papers, long-form video lectures, or tutorials—it can generate interactive flashcards, visualizations, or code in other formats to help you master the material.

It can even analyze your pickleball match videos, identify areas for improvement, and create training plans to全面提升 your game.

To help you better understand online content, AI Mode in Search now uses Gemini 3 to deliver new generative UI experiences—such as immersive visual layouts, interactive tools, and simulations—all generated instantly based on your queries.

Build Anything

Building on the success of 2.5 Pro, Gemini 3 fulfills its promise to turn developers’ ideas into reality. It excels in zero-shot generation, handling complex prompts and instructions to render richer, more interactive web user interfaces.

Gemini 3 is Google’s best Vibe coding and Agent coding model to date, making Google’s products more autonomous and significantly boosting developer productivity. It leads the WebDev Arena leaderboard with an impressive 1487 Elo score. It also achieved 54.2% on Terminal-Bench 2.0, a test measuring a model’s ability to use tools by operating computers via terminal. Furthermore, it greatly surpassed the 2.5 Pro version on SWE-bench Verified (76.2%), a benchmark for evaluating coding agents.

Now, users can build with Gemini 3 via Google AI Studio, Vertex AI, Gemini CLI, and Google’s new agent development platform, Google Antigravity. It’s also available on third-party platforms like Cursor, GitHub, JetBrains, Manus, and Replit.

For instance, creating a retro 3D spaceship game with richer visuals and stronger interactivity.

Or building more sophisticated, interactive web UIs and applications.

Plan Anything

Since the Gemini 2 agent, Gemini has significantly enhanced its planning capabilities for long-horizon tasks.

Gemini 3’s planning prowess is further validated on Vending-Bench 2: Gemini 3 topped the leaderboard in a simulated vending machine management test, fully managing virtual business operations through long-term planning.

In a full-year simulation, Gemini 3 Pro maintained consistent tool usage and decision coherence, staying focused on goals while achieving higher returns on investment.

Gemini 3 Pro demonstrates superior long-term planning capability, generating higher returns compared to other cutting-edge models.

Gemini Agent can also help organize your Gmail inbox.

Gemini 3 is now fully available. Starting today, free and paid subscribers can access the new model via the Gemini App and Search AI Mode; developers and enterprise customers can integrate via AI Studio, Vertex AI, and other channels. The much-anticipated "Deep Think mode" is expected to roll out exclusively to Google AI Ultra subscribers in the coming weeks.

Additionally, according to previously leaked model cards, there are many key details worth noting: Google trained this model from scratch using TPUs. As a MoE, it supports 1M input and 64k output tokens—being a MoE means they can afford to make it cheaper.

Pricing-wise, Gemini 3.0 Pro introduces a tiered pricing model based on context length: for tasks under 200k tokens, input/output costs are $2.00/$12.00 per million tokens; beyond 200k tokens, they rise to $4.00 and $18.00 respectively.

A New "Agent-First" Development Experience

Google Antigravity is Google’s new agent development platform, enabling developers to operate at a higher, task-oriented level. Leveraging Gemini 3’s advanced reasoning, tool usage, and agent programming capabilities, Google Antigravity transforms AI assistance from a mere tool in the developer’s toolkit into an active partner.

While Google Antigravity retains the familiar AI IDE (integrated development environment) experience at its core, its agents now have a dedicated interface with direct access to the editor, terminal, and browser. Agents can now autonomously plan and execute complex end-to-end software tasks on your behalf, while also verifying their own code.

Besides Gemini 3 Pro, Google Antigravity tightly integrates Google’s latest Gemini 2.5 Computer Use model for browser control and its top-tier image editing model, Nano Banana (Gemini 2.5 Image).

Hands-On Experience

Now that the Gemini 3 Pro preview is live on the AI Studio platform, we gave it a try.

Prompt: SVG of NEW YORK SKYLINE Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome.make it interesting and highly detail , shows details that no one expected go full creative and full beauty in one code block.

Prompt: Create a visually stunning Space Invaders game.

The pelican riding a bicycle—an animated SVG challenge that stumped many large models—was something we also tested on Gemini 3. Prompt: An animated SVG of a pelican riding a bicycle.

Compared to previous versions, Gemini 3 has made significant progress, though bugs remain—for example, the bike pedals spin in midair.

We tried a clearer prompt: Create a single, complete, self-contained animated SVG code (no external files or images) of a cute pelican riding a bicycle from a side view. This time, the bicycle generated by Gemini 3 seemed to lack pedals altogether.

Final Thoughts

In X blogger Chubby’s poll asking “Which company will have the best LLM by end of 2026?”, Google Gemini is far ahead.

This renewed market confidence is also reflected in data. In Alphabet CEO Sundar Pichai’s official blog post reviewing Gemini’s progress over the past two years: AI Overviews now has 2 billion monthly active users, the Gemini app has surpassed 650 million monthly actives, and over 70% of cloud customers plus 13 million developers are using its generative models.

Looking back two years—from Bard’s (Gemini’s predecessor) rushed launch and stock plunge, to Google’s painful realization, merging with DeepMind, recalling founders, and even winning a Nobel Prize—Google has executed a textbook-perfect "elephant turn."

The giant that once defined the Transformer, now "All in Gemini," is ready for a full-scale counterattack.

Whether it will finally settle the debate over the "best LLM"? Don’t rush—let the bullets (and servers) fly a little longer.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

机器之心Synced

@Synced_Global

Gemini 3 Launches Late Night: Outshines GPT-5.1, Ushering in Google's Era of Large Models

TechFlow Selected TechFlow Selected

Gemini 3 Launches Late Night: Outshines GPT-5.1, Ushering in Google's Era of Large Models