Google TPU v7 Cuts Inference Costs 70%, Challenging Nvidia in AI Chip Economics
Google’s TPU v7 slashes inference costs by approximately 70% compared to TPU v6, narrowing the gap with Nvidia’s GB200 NVL72 and matching it in key cost metrics, according to a Goldman Sachs report published January 21, 2026. The shift reflects a broader industry pivot from raw compute speed to sustainable, low-cost AI execution. Goldman Sachs’ analysis of depreciation, power consumption, and system utilization reveals that TPU efficiency stems from system-level integration—high-bandwidth interconnects, HBM memory, TSMC’s CoWoS packaging, and rack-scale optimizations. Google now uses TPUs widely for Gemini model inference, while Anthropic has placed a $2.1 billion order with Broadcom (AVGO-US) for future TPU-based systems, expected delivery mid-2026. Though Nvidia (NVDA-US) retains lead time and CUDA ecosystem advantages, AMD (AMD-US) and Amazon (AMZN-US) remain behind on cost reduction. Goldman maintains “Buy” ratings on Nvidia and Broadcom, forecasting a clear division: GPUs dominate training and general compute, while custom ASICs gain traction in scalable, predictable inference workloads as AI enters a “token-by-token ROI” era.