Cloud Vision API
What is Cloud Vision API?
Google Cloud Vision API is a powerful cloud-based solution that combines the capabilities of image analytics into an easy-to-use API. It offers a broad set of features for OCR, face detection, logo detection, and keyword detection as well as a classification framework. The API provides image recognition and classification services on images with various sizes, formats, and orientations – all of which can be processed in real-time. With the advent of AI technology, the market is now flooded with applications that rely on computer vision like driverless cars, unlimited searching potential based on captured images, etc.
As a developer, Vision API permits you to build applications with image recognition and classification capabilities, without having to face the challenges of hardware processing power and cleansing of large volumes of data. Contrary to Google Cloud AutoML these capabilities are packaged into a cloud-based pre-trained model. Image recognition and classification is a highly intricate process that is significantly simplified by Vision API. How exactly does this process work?
How does image recognition and classification work with Cloud Vision API?
In image recognition, a device or application understands the characteristics of images by analyzing content, similar to the human eye. It works on the basic principle that an image consists of two components – pixels and features. A feature is a characteristic that stands out in an image to help recognize an object or recognize it automatically. There are three important things in this identification; size, shape, and color information. Cloud Vision API uses all three of these things to form a feature vector and then compares the feature vector with previously known vectors in its database to determine if they match.
Image classification is based on Google’s machine learning algorithms, which are trained on a large number of example images with labels specifying what type of content they contain. The software assigns each image a set of labels—known as ‘label sets’—that describe the type of content detected in it. The label sets can be used to categorize images into groups, filter them according to their content, or search them by specific label names (such as “sports” or “snow”).
These models can be accessed through an SDK – Apache TensorFlow. The latter was trained on the Google Cloud Storage set of images and labels. Cloud Storage hosts petabytes of image data, enhancing the efficiency of this service. Developers can incorporate pre-trained models into their solutions. Alternatively, they can create their custom models which they can then use to process images and return classification predictions.
Apart from items, Cloud Vision API can recognize emotions on a human face. A device directly connected to Vision API in the cloud can send a face image captured by its camera for processing. The API detects the face along with associated emotions; it can also detect entities within the image. How does it do this?
Facial detection works when the API detects the spaces in the picture and returns the relative positions of the eyes, nose, and mouth. The shape variations can be used to denote the type of emotion. This analysis result is displayed as a JSON file in real-time.
You can already see how useful this solution is. The New York Times is already utilizing this service to digitize its image archives from decades back.