Machine learning inherently requires large amounts of data to work and draw patterns. In recent years, there has a been a push to ensure the information used remains private. Google today announced an open source differential privacy library that its own products use.
Differentially private data analysis is a principled approach that enables organizations to learn from the majority of their data while simultaneously ensuring that those results do not allow any individual’s data to be distinguished or re-identified.
According to the company, this differential privacy library “helps power some of Google’s core products.” It lets developers and organizations implement features that can otherwise be “difficult to execute from scratch.” Google especially focused on making it easy to use and deploy:
- Statistical functions: Most common data science operations are supported by this release. Developers can compute counts, sums, averages, medians, and percentiles using our library.
- Rigorous testing: Getting differential privacy right is challenging. Besides an extensive test suite, we’ve included an extensible ‘Stochastic Differential Privacy Model Checker library’ to help prevent mistakes.
- Ready to use: The real utility of an open-source release is in answering the question, ‘Can I use this?’ That’s why we’ve included a PostgreSQL extension along with common recipes to get you started. We’ve described the details of our approach in a technical paper that we’ve just released today.
- Modular: We designed the library so that it can be extended to include other functionalities such as additional mechanisms, aggregation functions, or privacy budget management.
One Google service that leverages differentially privacy is Maps, given the large number of crowdsourced user contributions. This includes the popular times feature that notes whether a location is business, and popular dishes. It’s also used by the Google Fi MVNO and Gboard.
In March, Google also brought differential privacy techniques to third-party ML developers using TensorFlow Privacy and Federated. The company is investing in new privacy technologies, and wants to see them deployed more widely.
We’re excited to make this library broadly available and hope developers will consider leveraging it as they build out their comprehensive data privacy strategies. From medicine, to government, to business, and beyond, it’s our hope that these open-source tools will help produce insights that benefit everyone.