Object Localization

Object Localization is the process of locating an object in an image, and creating a bounding box around the object once localized.

Classification with Localization

We use CNNs for object classification. However, they can also be used for localization simultaneously.

To do this, we must add the parameters bx,by,bh,bwb_x, b_y, b_h, b_w to the softmax output where (bx,by)(b_x, b_y) are the coordinates for the center of the required bounding box and bh,bwb_h, b_w are its height and width respectively.

Note that the training images must contain bounding boxes too (with the 4 parameters) so as to be able to learn the parameters.

In fact, every training image has the following vector associated with it:

[p,bx,by,bh,bw,cp, b_x, b_y, b_h, b_w, c]

where p=1 if there is an object in the image and c is the label of the object.

If p=0 (no object in the image), then the vector becomes [0, ?, ?, ?, ?, ?] where ?s denote "don't-care" values.

(c can be one-hot encoded).

Last updated