• Google Cloud debuted TPU 8t for training and TPU 8i for inference, splitting its AI chip architecture for the first time.
  • The TPU 8i supports up to 1,152 chips organized via Boardfly hierarchical network topology for distributed inference workloads.
  • Both chips integrate Axion Arm-based CPUs at a 2:1 accelerator ratio, replacing the previous x86 configuration.

Google Cloud introduced its eighth-generation Tensor Processing Units this week, breaking from its traditional unified architecture to offer two distinct chips: the TPU 8t for training large AI models and the TPU 8i for running inference on those models.

The announcement at Cloud Next 2026 marks Google’s most aggressive attempt yet to challenge Nvidia’s dominance in AI accelerators.

The split reflects how AI infrastructure has matured into specialized domains. Training requires raw compute density and high-bandwidth interconnects to process massive datasets across thousands of chips, while inference prioritizes efficiency and low latency for serving predictions to users. By dedicating silicon to each task, Google aims to match or exceed Nvidia’s performance in both categories. The chips will be generally available through Google Cloud later this year.

“Google’s custom TPU program represents perhaps the pinnacle of hyper-scale ASIC designs,” noted Patrick Kennedy at ServeTheHome. The assessment carries weight—Google has been deploying TPUs in production since 2016, giving it years of real-world performance data to inform the 8th generation design. Unlike competitors building general-purpose GPUs, Google hones its silicon specifically for internal workloads while offering cloud access to external customers.

Why Google Bet on Boardfly Topology

Both chips incorporate Google’s Boardfly hierarchical network topology, enabling efficient scaling from individual boards to multi-rack configurations. The TPU 8i specifically supports up to 1,152 chips organized into 36 groups across 8 boards, facilitating all-reduce operations critical for distributed inference workloads where multiple chips must coordinate to process a single request.

The networking architecture addresses one of the biggest bottlenecks in modern AI clusters: communication bandwidth between chips. Boardfly is built for the communication patterns common in transformer-based models, which now dominate production AI systems. Google claims improved sparsity support and enhanced matrix multiplication units on the 8i, with particular emphasis on INT8 and FP8 operations that have become standard for production inference deployments.

The TPU 8i platform pairs each accelerator with Axion Arm-based CPUs at a 2:1 TPU-to-CPU ratio—a shift from the x86 processors Google previously used for TPU host processing. The change marks a significant validation for Arm’s data center ambitions, where it has been steadily gaining traction against Intel and AMD. Sundar Pichai displayed both chips on stage at Cloud Next 2026, emphasizing that the 8 series positions against Nvidia’s Vera Rubin generation rather than the previous Grace Blackwell line.

Training at Scale: What the TPU 8t Brings

While the TPU 8i focuses on serving production traffic, the TPU 8t targets the massive computational requirements of training frontier AI models. The chip scales to pod configurations where thousands of accelerators operate as a unified training fabric, with Google’s hierarchical network topology enabling efficient scaling across multiple rack boundaries.

The 8t builds directly on lessons from Ironwood, Google’s seventh-generation TPU that introduced expanded precision support and broader cloud availability. According to Google’s technical documentation, the TPU 8t offers improved efficiency metrics and larger pod configurations specifically designed for frontier model training requirements.

Behind Google’s TPU push lies a fundamental bet: that specialized silicon can outperform general-purpose GPUs for the specific workloads that matter most. Google has been running TPUs in production for nearly a decade, using them to train models like Gemini and serve billions of queries daily. That operational experience shapes the 8th generation in ways competitors cannot easily replicate.

For context on Google’s infrastructure ambitions, see our coverage of the company’s recent multi-billion-dollar deal with Mira Murati’s Thinking Machines Lab. That agreement secured guaranteed capacity for training runs, suggesting Google is positioning the TPU 8t specifically to compete for large-scale training contracts against Nvidia’s offerings.

Despite the specialized chips, Google maintains significant Nvidia purchases alongside its custom silicon. The company has bought substantial quantities of Vera Rubin generation hardware even as it promotes the 8t and 8i. This dual-vendor strategy reflects reality: TPUs serve a narrower market segment than Nvidia’s broad ecosystem spanning enterprise, cloud, workstation, and edge deployments. For workloads where CUDA ecosystem compatibility or specific hardware capabilities matter, Nvidia still wins. But for customers building on Google’s stack, the specialized chips offer efficiency gains that general-purpose accelerators struggle to match.

Neither Google nor analysts expect TPUs to displace Nvidia from the market. Rather, the eighth generation positions Google as a viable alternative for customers willing to commit to its ecosystem. For companies already using Google Cloud, the efficiency benefits may prove compelling enough to consider the switch.

The 8t and 8i will be available to Google Cloud customers later this year, with pricing and exact specifications expected at general availability.

Leave your vote