Be a part of our every day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra
Google Cloud unveiled its seventh-generation Tensor Processing Unit (TPU), Ironwood, on Wednesday. This tradition AI accelerator, the corporate claims, delivers greater than 24 instances the computing energy of the world’s quickest supercomputer when deployed at scale.
The brand new chip, introduced at Google Cloud Subsequent ’25, represents a major pivot in Google’s decade-long AI chip improvement technique. Whereas earlier generations of TPUs have been designed primarily for each coaching and inference workloads, Ironwood is the primary purpose-built particularly for inference — the method of deploying skilled AI fashions to make predictions or generate responses.
“Ironwood is constructed to help this subsequent section of generative AI and its large computational and communication necessities,” stated Amin Vahdat, Google’s Vice President and Normal Supervisor of ML, Programs, and Cloud AI, in a digital press convention forward of the occasion. “That is what we name the ‘age of inference’ the place AI brokers will proactively retrieve and generate knowledge to collaboratively ship insights and solutions, not simply knowledge.”
Shattering computational boundaries: Inside Ironwood’s 42.5 exaflops of AI muscle
The technical specs of Ironwood are hanging. When scaled to 9,216 chips per pod, Ironwood delivers 42.5 exaflops of computing energy — dwarfing El Capitan‘s 1.7 exaflops, presently the world’s quickest supercomputer. Every particular person Ironwood chip delivers peak compute of 4,614 teraflops.
Ironwood additionally options vital reminiscence and bandwidth enhancements. Every chip comes with 192GB of Excessive Bandwidth Reminiscence (HBM), six instances greater than Trillium, Google’s previous-generation TPU introduced final yr. Reminiscence bandwidth reaches 7.2 terabits per second per chip, a 4.5x enchancment over Trillium.
Maybe most significantly, in an period of power-constrained knowledge facilities, Ironwood delivers twice the efficiency per watt in comparison with Trillium, and is sort of 30 instances extra energy environment friendly than Google’s first Cloud TPU from 2018.
“At a time when out there energy is among the constraints for delivering AI capabilities, we ship considerably extra capability per watt for buyer workloads,” Vahdat defined.
From mannequin constructing to ‘considering machines’: Why Google’s inference focus issues now
The emphasis on inference quite than coaching represents a major inflection level within the AI timeline. The {industry} has been fixated on constructing more and more huge basis fashions for years, with firms competing totally on parameter dimension and coaching capabilities. Google’s pivot to inference optimization suggests we’re getting into a brand new section the place deployment effectivity and reasoning capabilities take middle stage.
This transition is sensible. Coaching occurs as soon as, however inference operations happen billions of instances every day as customers work together with AI techniques. The economics of AI are more and more tied to inference prices, particularly as fashions develop extra advanced and computationally intensive.
Through the press convention, Vahdat revealed that Google has noticed a 10x year-over-year improve in demand for AI compute over the previous eight years — a staggering issue of 100 million total. No quantity of Moore’s Legislation development might fulfill this progress curve with out specialised architectures like Ironwood.
What’s significantly notable is the concentrate on “considering fashions” that carry out advanced reasoning duties quite than easy sample recognition. This implies that Google sees the way forward for AI not simply in bigger fashions, however in fashions that may break down issues, cause by a number of steps and simulate human-like thought processes.
Gemini’s considering engine: How Google’s next-gen fashions leverage superior {hardware}
Google is positioning Ironwood as the muse for its most superior AI fashions, together with Gemini 2.5, which the corporate describes as having “considering capabilities natively in-built.”
On the convention, Google additionally introduced Gemini 2.5 Flash, a cheaper model of its flagship mannequin that “adjusts the depth of reasoning primarily based on a immediate’s complexity.” Whereas Gemini 2.5 Professional is designed for advanced use instances like drug discovery and monetary modeling, Gemini 2.5 Flash is positioned for on a regular basis purposes the place responsiveness is crucial.
The corporate additionally demonstrated its full suite of generative media fashions, together with text-to-image, text-to-video, and a newly introduced text-to-music functionality known as Lyria. An indication confirmed how these instruments may very well be used collectively to create an entire promotional video for a live performance.
Past silicon: Google’s complete infrastructure technique contains community and software program
Ironwood is only one a part of Google’s broader AI infrastructure technique. The corporate additionally introduced Cloud WAN, a managed wide-area community service that offers companies entry to Google’s planet-scale non-public community infrastructure.
“Cloud WAN is a totally managed, viable and safe enterprise networking spine that gives as much as 40% improved community efficiency, whereas additionally decreasing complete value of possession by that very same 40%,” Vahdat stated.
Google can also be increasing its software program choices for AI workloads, together with Pathways, its machine studying runtime developed by Google DeepMind. Pathways on Google Cloud permits clients to scale out mannequin serving throughout a whole bunch of TPUs.
AI economics: How Google’s $12 billion cloud enterprise plans to win the effectivity struggle
These {hardware} and software program bulletins come at an important time for Google Cloud, which reported $12 billion in This fall 2024 income, up 30% yr over yr, in its newest earnings report.
The economics of AI deployment are more and more turning into a differentiating issue within the cloud wars. Google faces intense competitors from Microsoft Azure, which has leveraged its OpenAI partnership right into a formidable market place, and Amazon Internet Companies, which continues to increase its Trainium and Inferentia chip choices.
What separates Google’s strategy is its vertical integration. Whereas rivals have partnerships with chip producers or acquired startups, Google has been growing TPUs in-house for over a decade. This offers the corporate unparalleled management over its AI stack, from silicon to software program to providers.
By bringing this expertise to enterprise clients, Google is betting that its hard-won expertise constructing chips for Search, Gmail, and YouTube will translate into aggressive benefits within the enterprise market. The technique is obvious: provide the identical infrastructure that powers Google’s personal AI, at scale, to anybody keen to pay for it.
The multi-agent ecosystem: Google’s audacious plan for AI techniques that work collectively
Past {hardware}, Google outlined a imaginative and prescient for AI centered round multi-agent techniques. The corporate introduced an Agent Improvement Package (ADK) that permits builders to construct techniques the place a number of AI brokers can work collectively.
Maybe most importantly, Google introduced an “agent-to-agent interoperability protocol” (A2A) that permits AI brokers constructed on completely different frameworks and by completely different distributors to speak with one another.
“2025 shall be a transition yr the place generative AI shifts from answering single inquiries to fixing advanced issues by agented techniques,” Vahdat predicted.
Google is partnering with over 50 {industry} leaders, together with Salesforce, ServiceNow and SAP, to advance this interoperability customary.
Enterprise actuality examine: What Ironwood’s energy and effectivity imply in your AI technique
For enterprises deploying AI, these bulletins might considerably cut back the associated fee and complexity of working refined AI fashions. Ironwood’s improved effectivity might make working superior reasoning fashions extra economical, whereas the agent interoperability protocol might assist companies keep away from vendor lock-in.
The true-world affect of those developments shouldn’t be underestimated. Many organizations have been reluctant to deploy superior AI fashions on account of prohibitive infrastructure prices and power consumption. If Google can ship on its performance-per-watt guarantees, we might see a brand new wave of AI adoption in industries which have up to now remained on the sidelines.
The multi-agent strategy is equally vital for enterprises overwhelmed by the complexity of deploying AI throughout completely different techniques and distributors. By standardizing how AI techniques talk, Google is making an attempt to interrupt down the silos which have restricted AI’s enterprise affect.
Through the press convention, Google emphasised that over 400 buyer tales can be shared at Subsequent ’25, showcasing actual enterprise affect from its AI improvements.
The silicon arms race: Will Google’s customized chips and open requirements reshape AI’s future?
As AI advances, its infrastructure will grow to be more and more crucial. Google’s investments in specialised {hardware} like Ironwood and its agent interoperability initiatives recommend the corporate is positioning itself for a future the place AI turns into extra distributed, extra advanced, and extra deeply built-in into enterprise operations.
“Main considering fashions like Gemini 2.5 and the Nobel Prize successful AlphaFold all run on TPUs right this moment,” Vahdat famous. “With Ironwood we are able to’t wait to see what AI breakthroughs are sparked by our personal builders and Google Cloud clients when it turns into out there later this yr.”
The strategic implications prolong past Google’s personal enterprise. By pushing for open requirements in agent communication whereas sustaining proprietary benefits in {hardware}, Google is making an attempt a fragile balancing act. The corporate desires the broader ecosystem to flourish (with Google infrastructure beneath) whereas sustaining aggressive differentiation.
Within the months forward, key elements will embrace how rapidly opponents reply to Google’s {hardware} developments and whether or not the {industry} coalesces across the proposed agent interoperability requirements. If historical past is any information, we are able to anticipate Microsoft and Amazon to counter with their very own inference optimization methods, probably establishing a three-way race to construct probably the most environment friendly AI infrastructure stack.