Mean Normalization

In addition to scaling the features, some may also consider Mean Normalization.

In this, we replace $x_i$ by $x_i-μ_i$ so as to make the features have approximately 0 mean.

(Note: This is not applied for $x_0$ which has a fixed value 1).

In general, we can use the following formula to scale the features using mean normalization:

$x_i = (x_i-μ_i)/S_i$

where $x_i$ is the $i^{th}$ feature, $μ_i$ is its mean and $S_i$ is its range (i.e. max-min).

If this leads to $x_i$ being in the range [-0.5, 0.5] approximately, the gradient descent will work quickly.

Last updated 4 years ago