I am having trouble understanding the weight update rule for perceptrons:
w(t + 1) = w(t) + y(t)x(t).
Assume we have a linearly separable data set.
- w is a set of weights [w0, w1, w2, ...] where w0 is a bias.
- x is a set of input parameters [x0, x1, x2, ...] where x0 is fixed at 1 to accommodate the bias.
At iteration t, where t = 0, 1, 2, ...,
- w(t) is the set of weights at iteration t.
- x(t) is a misclassified training example.
- y(t) is the target output of x(t) (either -1 or 1).
Why does this update rule move the boundary in the right direction?