Sliding Window Detection

Once a CNN is trained using cropped images that contain just the target object (minimal background), say a car, it will be able to detect a car in a test image with background by sliding over the image and looking for a car in parts of the image (each part is passed as an input to the CNN which classifies it as car/not car).

This is, however, computationally inefficient.

A more efficient convolutional implementation of sliding windows is discussed below.

Convolutional Implementation of Sliding Windows

If we use the following convolutional approach, the same result can be obtained in a single forward pass, and the results obtained by each slide of the window in the traditional approach would be the same as that computed in the output of the CNN i.e. each of the 8 labels obtained per row in the traditional approach would match the first row of the 8x8x4 output of the CNN.

Last updated