Skip to main content

All Google chip design now takes place in the cloud

Besides the Tensor SoC on Pixel phones, Google has developed other custom chips, with all silicon design now occurring in the cloud.

Previously, Google’s chip development infrastructure team used “dozens of racks and hundreds of servers” in a data center.

As projects began to mount, so did the implementation challenges, with hardware costs doubling annually, and each new initiative requiring new engineers and infrastructure. When the team was prioritizing hiring engineers simply to manage and optimize legacy machines, they knew they were losing sight of their core focus: growth and innovation. 

Later on, this team “explored a hybrid solution using Google’s internal software design environment, and some Electronic Design Automation (EDA) workloads sent into Google Cloud.”

While the approach was reliable in the short-term, delays in transferring workloads for analysis would leave engineers waiting around for results. The added burden of having two desktops running concurrently, one for their design environment and one for their results in Google Cloud, led to a rethink. 

The chip division eventually decided to migrate fully to the cloud with the help of an internal “Alphabet Cloud” team that’s responsible for “helping teams across Alphabet accelerate their adoption of Google Cloud’s unique offerings to drive faster development and scale, just like a customer’s platform team would.” The team is using Google Kubernetes Engine (GKE) for containers, as well as Cloud StorageFilestoreCloud SpannerBig Query, and Pub/Sub for data.

This transition allowed the chip group to use Google Cloud’s existing ML algorithms to “efficiently navigate large search spaces and apply unique optimizations at various stages of chip design.” 

This resulted in a shortened chip design process, reduced time-to-market, expanded product areas for ML accelerators, and improved efficiency.

Since it’s easier to add more compute resources, “chip designers were able to run more jobs to weed out bugs.” 

Since moving to Google Cloud, the team increased daily job submissions by 170% over the past year while maintaining a flat scheduling latency. The workload is supported across 250+ GKE clusters spanning multiple Google Cloud regions.

On the business side, there was a reduction in operational costs, finding infrastructure bugs faster, and spending “less time on data center maintenance.”

The team said “that all Google chip design projects are using Google Cloud” now.

The chip design team has launched full designs built using Google Cloud, including the last two generations of TPUs and YouTube’s video accelerator program, Argos VCU

More on Google chips:

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Abner Li Abner Li

Editor-in-chief. Interested in the minutiae of Google and Alphabet. Tips/talk: abner@9to5g.com