The YOLO (You Only Look Once) Algorithm

A better algorithm that tackles the issue of predicting accurate bounding boxes while using the convolutional sliding window technique is the YOLO algorithm. YOLO stands for you only look onceand was developed in 2015 by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. It’s popular because it achieves high accuracy while running in real time. This algorithm is called so because it requires only one forward propagation pass through the network to make the predictions.

The algorithm divides the image into grids and runs the image classification and localization algorithm (discussed under object localization) on each of the grid cells. For example, we have an input image of size 256 × 256. We place a 3 × 3 grid on the image (see Fig. 8).

Fig. 8 Grid (3 x 3) representation of the image

Next, we apply the image classification and localization algorithm on each grid cell. For each grid cell, the target variable is defined as

yi,j=[pcbxbybhbwc1c2c3c4]T(6)

Do everything once with the convolution sliding window. Since the shape of the target variable for each grid cell is 1 × 9 and there are 9 (3 × 3) grid cells, the final output of the model will be:

The advantages of the YOLO algorithm is that it is very fast and predicts much more accurate bounding boxes. Also, in practice to get more accurate predictions, we use a much finer grid, say 19 × 19, in which case the target output is of the shape 19 × 19 × 9.

Conclusion

With this, we come to the end of the introduction to object detection. We now have a better understanding of how we can localize objects while classifying them in an image. We also learned to combine the concept of classification and localization with the convolutional implementation of the sliding window to build an object detection system. In the next blog, we will go deeper into the YOLO algorithm, loss function used, and implement some ideas that make the YOLO algorithm better. Also, we will learn to implement the YOLO algorithm in real time.