Last year, Google introduced a series of mobile-first computer vision neural networks that allows for image classification and detection while remaining fast and low-power given the constraints of running on-device. The company is today making available MobileNetV2 with several performance improvements.
This “next generation of on-device computer vision networks” builds off MobileNetV1, and adds two new features to the architecture:
1) linear bottlenecks between the layers The intuition is that the bottlenecks encode the model’s intermediate inputs and outputs while the inner layer encapsulates the model’s ability to transform from lower-level concepts such as pixels to higher level descriptors such as image categories.
2) shortcut connections between the bottlenecks Finally, as with traditional residual connections, shortcuts enable faster training and better accuracy.
This results in MobileNetV2 being faster overall while maintaining the same rates of accuracy. However, in some cases, it also achieves higher rates in the latter benchmark:
In particular, the new models use 2x fewer operations, need 30% fewer parameters and are about 30-40% faster on a Google Pixel phone than MobileNetV1 models, all while achieving higher accuracy.
Other improvement areas include object detection and on-device semantic segmentation, which is responsible for features like swapping out backgrounds without the need of a green screen in YouTube and Portrait mode.
MobileNetV2 is released as part of TensorFlow-Slim Image Classification Library, or you can start exploring MobileNetV2 right away in coLaboratory. Alternately, you can download the notebook and explore it locally using Jupyter. MobileNetV2 is also available as modules on TF-Hub, and pretrained checkpoints can be found on github.