The YOLO Algorithm

When we use sliding windows, we may not get the most accurate bounding box, as shown below:

A more accurate way to get bounding boxes is the YOLO (You Only Look Once) algorithm.

The crux of this algorithm is to divide the image into cells using a grid, and then apply the object localization algorithm using CNNs (as described earlier) on each grid cell. It will output accurate bounding boxes as long as there is a single object in the cell.

Note that while training, we assign objects only to one grid cell, based on the cell where the center of its bounding box lies. Therefore, in the image above, the central cell will be considered to be empty, even though it contains part of a car.

Since YOLO has a convolutional implementation, all the grid cells are simultaneously processed. This results in a fast and accurate prediction. It is, therefore, also used for real-time object detection.

Last updated