Over the past year, Google has instituted a number of efforts to find harmful apps in the Play Store. The latest applies machine learning on a large-scale to a technique known as peer group analysis that compares apps in order to find malicious ones.
The particular category of harmful apps that Google is looking to weed out are those that request and send out too much user data.
Using privacy and security signals, applications that are similar to each other in functionality are grouped together. These peer groups are established to assess the expected functionality of an app, as well as “adequate boundaries of behaviors that may be considered unsafe or intrusive.”
For example, most coloring book apps don’t need to know a user’s precise location to function and this can be established by analyzing other coloring book apps. By contrast, mapping and navigation apps need to know a user’s location, and often require GPS sensor access.
If an app within a group requests information that most peers don’t require, it will be flagged for further vetting by Google.
The manual creation of peer groups is error-prone and time-consuming, as such Google has turned to machine learning in order to “[cluster] mobile apps with similar capabilities.”
Our approach uses deep learning of vector embeddings to identify peer groups of apps with similar functionality, using app metadata, such as text descriptions, and user metrics, such as installs.