Coinbase: Has Reduced AI Spending by Nearly 50% and Is Experimenting…

Navigating Web3 tides with focused insights

Contribute An Article

Media Requests

Risk Disclosure: This website's content is not investment advice and offers no trading guidance or related services. Per regulations from the PBOC and other authorities, users must be aware of virtual currency risks. Contact us / [email protected] ICP License: 琼ICP备2022009338号

TechFlow News, June 27: Coinbase CEO Brian Armstrong shared updates on the company’s latest progress in AI cost optimization.

Armstrong stated that, as AI usage and token consumption continue to rise, the key to controlling costs lies not in restricting employee access or frequently sending budget alerts, but rather in optimizing default model selection, task routing mechanisms, and caching strategies.

He revealed that Coinbase is experimenting—via its internal LLM gateway—with open-weight models such as GLM 5.2 and Kimi 2.7 as default options, while still allowing engineers to select other models based on specific task requirements. Data shows that 91% of employees have never reached their AI usage quota; therefore, instead of tightening quotas, Coinbase has opted to boost overall efficiency by adopting lower-cost model solutions.

Regarding model routing, Coinbase pre-processes prompts and automatically routes tasks to the most suitable model based on cache hit rates and pricing differences across models. Armstrong noted that complex tasks—such as planning and reasoning—may require state-of-the-art models, whereas execution-oriented tasks may not necessitate higher-cost models. In the future, model selection should be increasingly automated by AI, rather than relying on manual decisions.

Additionally, he pointed out that cache hit rate is one of the critical factors influencing AI costs. Coinbase has integrated cache-aware logic into its request pipeline to increase reuse of historical results. For example, after optimizing its caching strategy, LibreChat’s cache hit rate rose from 5% to 60%.

Armstrong also emphasized that engineers are encouraged to keep contexts concise—by initiating new sessions when switching tasks, narrowing file context scope, and disabling unused tools—to minimize unnecessary token consumption.

According to him, these measures have successfully reduced Coinbase’s AI spending by nearly 50%, while token usage continues to grow.

Source

Add to Favorites

Share to Social Media

7x24h News

TechFlow Selected

Coinbase: Has Reduced AI Spending by Nearly 50% and Is Experimenting with Defaulting to Open-Weight Models

7x24h News

Coinbase: Has Reduced AI Spending by Nearly 50% and Is Experimenting with Defaulting to Open-Weight Models