Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.
Asked Answered
D

10

58

I'm in the second week of Professor Andrew Ng's Machine Learning course through Coursera. We're working on linear regression and right now I'm dealing with coding the cost function.

The code I've written solves the problem correctly but does not pass the submission process and fails the unit test because I have hard coded the values of theta and not allowed for more than two values for theta.

Here's the code I've got so far

function J = computeCost(X, y, theta)

m = length(y);
J = 0;

for i = 1:m,
    h = theta(1) + theta(2) * X(i)
    a = h - y(i);
    b = a^2;
    J = J + b;
    end;
J = J * (1 / (2 * m));

end

the unit test is

computeCost( [1 2 3; 1 3 4; 1 4 5; 1 5 6], [7;6;5;4], [0.1;0.2;0.3])

and should produce ans = 7.0175

So I need to add another for loop to iterate over theta, therefore allowing for any number of values for theta, but I'll be damned if I can wrap my head around how/where.

Can anyone suggest a way I can allow for any number of values for theta within this function?

If you need more information to understand what I'm trying to ask, I will try my best to provide it.

Debbee answered 25/3, 2014 at 4:22 Comment(0)
A
90

You can use vectorize of operations in Octave/Matlab. Iterate over entire vector - it is really bad idea, if your programm language let you vectorize operations. R, Octave, Matlab, Python (numpy) allow this operation. For example, you can get scalar production, if theta = (t0, t1, t2, t3) and X = (x0, x1, x2, x3) in the next way: theta * X' = (t0, t1, t2, t3) * (x0, x1, x2, x3)' = t0*x0 + t1*x1 + t2*x2 + t3*x3 Result will be scalar.

For example, you can vectorize h in your code in the next way:

H = (theta'*X')';
S = sum((H - y) .^ 2);
J = S / (2*m);
Angeli answered 25/3, 2014 at 7:47 Comment(8)
Have you done away with the for loop there? And if I read that right you've written (theta transpose * X transpose)transpose.Debbee
Yes, these three lines of code replace entire loop! And so, it's transpose (I use Octave syntax)Angeli
I think you have used Capitals for the variables here as a matter of convention for naming matrix variables, so thank you for reminding me about that. What I don't understand is in the line "S = sum((H - y).^2);" what's the "."? I know I've seen it before but I can't recall it's purpose.Debbee
dot in matrix ariphmetic use for element by element operations. For example: A = [ 1 2 ; 3 4 ] B = [ 3 4 ; 1 2 ] So, A*B = [ 5 8 ; 13 20 ] (i.e. usually matrix multiplication) A.*B = [ 3 8 ; 3 8 ] (i.e. element by element multiplication - [ 1*3 2*4 ; 3*1 4*2] Similarly: A.^2 = [1^2 2^2 ; 3^2 4^2 ] = [1 4 ; 9 16 ]Angeli
OK, it took me quite a while to understand why that code works but it does. Thanks.Debbee
Why didn't you use "ones(1,97)' * ((X*theta)-y).^2"?Carborundum
the way you created H is a masterpiece absolutelyHeteroplasty
Hi guys, i know it's being a while. But why do you transposed 3 times in H?, the H formula is like H = theta' * XRorrys
O
41

Above answer is perfect but you can also do

H = (X*theta);
S = sum((H - y) .^ 2);
J = S / (2*m);

Rather than computing

(theta' * X')'

and then taking the transpose you can directly calculate

(X * theta)

It works perfectly.

Ochs answered 30/3, 2015 at 15:48 Comment(7)
Why do you need parens around X*theta?Rajewski
You don't need. I have this habit of putting parenthesis just to avoid confusion in case of large expressions.Ochs
Just to be clear, the above equality X*theta = (theta'*X')' holds because of the two identities : (A')' = A and A' * B' = (BA)'. So just taking (theta' * X') = (X * theta)' this, transposed, gives ((X * theta)')' which is equal to X * theta.Authorization
What I'm confused about is that in the equation for H(x), we have that H(x) = theta' * X, but it seems that we have to take the transpose of that when implementing it in code, but whyNostalgia
I'm also very curious about the answer to rasen58's question, even though it was asked a long time ago.Sagerman
@Nostalgia If anyone still cares about this, I had the same issue when trying to implement this.. Basically what I discovered, is in the cost function equation we have theta' * x. When we implement the function, we don't have x, we have the feature matrix X. x is a vector, X is a matrix where each row is one vector x transposed. So, that's where the extra transpose operations come from.Turkish
@kennycoc Thank you for the clarification. ( I reached this page after googling "theta transpose x") :-)Ferren
S
15

The below line return the required 32.07 cost value while we run computeCost once using θ initialized to zeros:

J = (1/(2*m)) * (sum(((X * theta) - y).^2));

and is similar to the original formulas that is given below.

enter image description here

Sickroom answered 4/12, 2015 at 11:41 Comment(0)
L
3

It can be also done in a line- m- # training sets

J=(1/(2*m)) * ((((X * theta) - y).^2)'* ones(m,1));
Lipocaic answered 5/8, 2015 at 21:35 Comment(1)
is it required to multiply with ones(m,1) ?Preconceive
L
0
J = sum(((X*theta)-y).^2)/(2*m);
ans =  32.073

Above answer is perfect,I thought the problem deeply for a day and still unfamiliar with Octave,so,Just study together!

Lob answered 28/2, 2017 at 7:45 Comment(2)
Sure,with pleasure.It is based on the cost function and uses matrix multiplication,rather than explicit summation or looping.Lob
I am not sure who gave you "-" but this is also solution I came up with. It's cleaner, I believe more efficient. got 100%.Milligan
R
0

If you want to use only matrix, so:

temp = (X * theta - y);        % h(x) - y
J = ((temp')*temp)/(2 * m);
clear temp;
Rune answered 4/2, 2019 at 17:36 Comment(0)
F
0

This would work just fine for you -

J =  sum((X*theta - y).^2)*(1/(2*m))

This directly follows from the Cost Function Equation

Foulk answered 29/3, 2020 at 21:3 Comment(0)
B
0

Python code for the same :

def computeCost(X, y, theta):
    m = y.size  # number of training examples
    J = 0
    H = (X.dot(theta))
    S = sum((H - y)**2);
    J = S / (2*m);
    return J
Berna answered 15/5, 2020 at 6:54 Comment(1)
what H stands for?Goatsucker
T
-1
function J = computeCost(X, y, theta)

m = length(y);

J = 0;

% Hypothesis h(x)
h = X * theta;

% Error function (h(x) - y) ^ 2
squaredError = (h-y).^2;

% Cost function
J = sum(squaredError)/(2*m);

end
Teresitateressa answered 8/7, 2019 at 16:13 Comment(2)
Please don't post code only as an answer. This is not helpful. Please take your time to provide high quality answers. Note: "This answer was flagged as low-quality because of its length and content.". If you don't improve the quality of your answer, this post might get deleted.Rovit
@Zoe What is wrong? I just informed the author that his post was flagged as low-quality and probably will be deleted. Posting code without any explanation is not a good answer. I didn't flag it though. This was just meant to be a nice advice.Rovit
U
-3

I think we needed to use iteration for much general solution for cost rather one iteration, also the result shows in the PDF 32.07 may not be correct answer that grader is looking for reason being its a one case out of many training data.

I think it should loop through like this

  for i in 1:iteration
  theta = theta - alpha*(1/m)(theta'*x-y)*x

  j = (1/(2*m))(theta'*x-y)^2
Underwent answered 8/12, 2015 at 4:50 Comment(1)
Vectorizing your code is better way of solving matrix operations than iterating matrix over a for loop.Boothman

© 2022 - 2024 — McMap. All rights reserved.