Mean Normalization

In addition to scaling the features, some may also consider Mean Normalization.

In this, we replace xix_i by xiμix_i-μ_i so as to make the features have approximately 0 mean.

(Note: This is not applied for x0x_0 which has a fixed value 1).

In general, we can use the following formula to scale the features using mean normalization:

xi=(xiμi)/Six_i = (x_i-μ_i)/S_i

where xix_i is the ithi^{th} feature, μiμ_i is its mean and SiS_i is its range (i.e. max-min).

If this leads to xix_i being in the range [-0.5, 0.5] approximately, the gradient descent will work quickly.

Last updated