However, Google too seems to acknowledge, “great software shines brightest with great hardware underneath”. This is why, over the past several years, the company has worked on a custom ASIC (application-specific integrated circuit) named Tensor Processing Unit (TPU), and it is unveiling it today…
The aim, the search giant says, was to see what they could accomplish with custom accelerators for machine learning applications.
The TensorFlow-tailored TPU, which has been running inside the firm’s data centers for more than a year now, delivered “an order of magnitude better-optimized performance per watt for machine learning”.
If that doesn’t sound impressive, to put that in perspective Google said that the improvement is roughly equivalent to a technology fast-forwarding an approximate seven years into the future, or three generations of Moore’s Law.
Ultimately, it all comes down to crazy optimization. With fewer transistors per operation required, more operations can be squeezed into the silicon, making all machine learning-based operations markedly faster.
The board itself also manages to be compact and functional, so that a TPU-equipped one fits easily into a hard disk drive slots in their data center racks.
The team is also proud of how quickly it managed to turn this idea into reality; it took them no more than 22 days to have a functional unit up and running in the center, which is indeed impressive.
Of course, Google’s ultimate goal is to be the industry leading machine learning company, which also means making its products available to developers and then customers.
“Building TPUs into our infrastructure stack will allow us to bring the power of Google to developers across software like TensorFlow and Cloud Machine Learning with advanced acceleration capabilities,” the company said.
“Machine Learning is transforming how developers build intelligent applications that benefit customers and consumers, and we’re excited to see the possibilities come to life.”