Local and global minima of the cost function in logistic regression
Asked Answered
F

2

5

I'm misunderstanding the idea behind the minima in the derivation of the logistic regression formula.

The idea is to increase the hypothesis as much as possible (i.e correct prediction probability close to 1 as possible), which in turn requires minimising the cost function $J(\theta)$ as much as possible.

Now I've been told that for this all to work, the cost function must be convex. My understanding of convexity requires there to be no maximums, and therefore there can only be one minimum, the global minimum. Is this really the case? If it's not, please explain why not. Also, if it's not the case, then that implies the possibility of multiple minima in the cost function, implying multiple sets of parameters yielding higher and higher probabilities. Is this possible? Or can I be certain the returned parameters refer to the global minima and hence highest probability/ prediction?

Fabricate answered 9/10, 2016 at 13:7 Comment(2)
(1) The Logistic regression problem is convex (2) Because it's convex, local-minimum = global-minimum 3) Regulization is a very important approach within this task; e.g. adding some costs to penalize the weights (4) L2-based regulization has only one solution (5) L1-based regulization might have multiple solutions of the same objective; still convex (6) There are algorithms not guaranteeing convergence to the optimum like SGD-based approaches. They are still important in large-scale optUnconsidered
Could you please elaborate or give some reference for L1 and L2 part , how they change solution ? How can L! have multiple solutions and still be convex ? Also does doing gradient updates in mini batch style or using some optimizer for learning rate changes the convexity of the method or solution ?Manila
L
12

The fact that we use convex cost function does not guarantee a convex problem.

There is a distinction between a convex cost function and a convex method.

The typical cost functions you encounter (cross entropy, absolute loss, least squares) are designed to be convex.

However, the convexity of the problem depends also on the type of ML algorithm you use.

Linear algorithms (linear regression, logistic regression etc) will give you convex solutions, that is they will converge. When using neural nets with hidden layers however, you are no longer guaranteed a convex solution.

Thus, convexity is a measure of describing your method not only your cost function!

LR is a linear classification method so you should get a convex optimization problem each time you use it! However, if the data is not linearly separable, it might not give a solution and it definitely won't give you a good solution in that case.

Landsknecht answered 11/1, 2017 at 15:23 Comment(0)
F
1

Yes, Logistic Regression and Linear Regression aims to find weights and biases which improve the accuracy of the model (or say work well with higher probability on the test data, or real world data). To achieve that, we try to find weights and biases such a way that it has least deviations (say cost) between prediction and real out-comes. So, if we plot cost function and find its minima, that would achieve the same purpose. Hence we use a model such a way that its cost function would have one local minima (i.e. model should be convex)

Fact answered 5/7, 2018 at 7:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.