I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plain gradient descent and the Newton's method?
From Wikipedia, I read this short line "Newton's method uses curvature information to take a more direct route." What does this intuitively mean?