TechFlow News, June 27: Coinbase CEO Brian Armstrong shared updates on the company’s latest progress in AI cost optimization.
Armstrong stated that, as AI usage and token consumption continue to rise, the key to controlling costs lies not in restricting employee access or frequently sending budget alerts, but rather in optimizing default model selection, task routing mechanisms, and caching strategies.
He revealed that Coinbase is experimenting—via its internal LLM gateway—with open-weight models such as GLM 5.2 and Kimi 2.7 as default options, while still allowing engineers to select other models based on specific task requirements. Data shows that 91% of employees have never reached their AI usage quota; therefore, instead of tightening quotas, Coinbase has opted to boost overall efficiency by adopting lower-cost model solutions.
Regarding model routing, Coinbase pre-processes prompts and automatically routes tasks to the most suitable model based on cache hit rates and pricing differences across models. Armstrong noted that complex tasks—such as planning and reasoning—may require state-of-the-art models, whereas execution-oriented tasks may not necessitate higher-cost models. In the future, model selection should be increasingly automated by AI, rather than relying on manual decisions.
Additionally, he pointed out that cache hit rate is one of the critical factors influencing AI costs. Coinbase has integrated cache-aware logic into its request pipeline to increase reuse of historical results. For example, after optimizing its caching strategy, LibreChat’s cache hit rate rose from 5% to 60%.
Armstrong also emphasized that engineers are encouraged to keep contexts concise—by initiating new sessions when switching tasks, narrowing file context scope, and disabling unused tools—to minimize unnecessary token consumption.
According to him, these measures have successfully reduced Coinbase’s AI spending by nearly 50%, while token usage continues to grow.




