Intuition for perceptron weight update rule

Asked 27/12, 2015 at 5:20 Answered 18/10, 2018 at 23:13

Solved algorithm machine-learning perceptron

I am having trouble understanding the weight update rule for perceptrons:

w(t + 1) = w(t) + y(t)x(t).

Assume we have a linearly separable data set.

w is a set of weights [w0, w1, w2, ...] where w0 is a bias.
x is a set of input parameters [x0, x1, x2, ...] where x0 is fixed at 1 to accommodate the bias.

At iteration t, where t = 0, 1, 2, ...,

w(t) is the set of weights at iteration t.
x(t) is a misclassified training example.
y(t) is the target output of x(t) (either -1 or 1).

Why does this update rule move the boundary in the right direction?

Beckwith answered 27/12, 2015 at 5:20 Comment(0)

The perceptron's output is the hard limit of the dot product between the instance and the weight. Let's see how this changes after the update. Since

w(t + 1) = w(t) + y(t)x(t),

then

x(t) ⋅ w(t + 1) = x(t) ⋅ w(t) + x(t) ⋅ (y(t) x(t)) = x(t) ⋅ w(t) + y(t) [x(t) ⋅ x(t))].

Note that:

By the algorithm's specification, the update is only applied if x(t) was misclassified.
By the definition of the dot product, x(t) ⋅ x(t) ≥ 0.

How does this move the boundary relative to x(t)?

If x(t) was correctly classified, then the algorithm does not apply the update rule, so nothing changes.
If x(t) was incorrectly classified as negative, then y(t) = 1. It follows that the new dot product increased by x(t) ⋅ x(t) (which is positive). The boundary moved in the right direction as far as x(t) is concerned, therefore.
Conversely, if x(t) was incorrectly classified as positive, then y(t) = -1. It follows that the new dot product decreased by x(t) ⋅ x(t) (which is positive). The boundary moved in the right direction as far as x(t) is concerned, therefore.

Cosh answered 27/12, 2015 at 5:56 Comment(2)

Can you expand a little on the last sentence? What happens when x(t) is classified and when x(t) is misclassified? – Beckwith 27/12, 2015 at 16:24

well that happens when its updated as y(t)/x(t) as well, keeping w(t+1) unitless, I don't understand why weight is updated proportional to the y(t).x(t). – Multiply 2/1 at 15:13

A better derivation of the perceptron update rule is documented here and here. The derivation is using gradient descent.

Basic premise of gradient descent algorithm is find the error of classification and make your parameters so that error is minimized.

PS: I was trying very hard to get the intuition on why would someone multiply x and y to derive the update for w. Because w is the slope for a single dimension (y = wx+c) and slope w = (y/x) and not y * x.

Exhibitive answered 18/10, 2018 at 23:13 Comment(1)

What did you come up to your PS? That's exact confusion I have. – Multiply 2/1 at 15:14

Recommended topics

Hot tags