Google has apologized for Gmail delivery delays experienced by some users yesterday. Google says that whilst most messages were not affected, 29% of messages were delayed by a couple of seconds, with only 1.5% of messages being critically delayed, by as much as two hours. The issue lasted for about 8 hours, starting at around 5:30 am PST.
Even so, Google has elaborated on what happened in significant detail — a level of transparency that is hard to find amongst tech companies.
The message delivery delays were triggered by a dual network failure. This is a very rare event in which two separate, redundant network paths both stop working at the same time. The two network failures were unrelated, but in combination they reduced Gmail’s capacity to deliver messages to users, and beginning at 5:54 a.m. PST messages started piling up. Google’s automated monitoring alerted the Gmail engineering team within minutes, and they began investigating immediately. Together with the networking team, the Gmail team restored some of the network capacity that was lost and worked to repurpose additional capacity, clearing much of accumulated message backlog by 1:00 p.m. PST and the remainder by shortly before 4:00 p.m. PST.
Google notes that the issue has now been resolved. Throughout the troubleshooting operation, Gmail service was not impaired. In the future, Google has pledged that their “top priority” is to prevent issues like this as much as possible. They are adding extra network capacity to ensure that Gmail uptime can continue even in the worst of cases.