predict.svm does not predict new data

Asked 16/12, 2010 at 15:0 Answered 16/12, 2010 at 15:10

unfortunately I have problems using predict() in the following simple example:

library(e1071)

x <- c(1:10)
y <- c(0,0,0,0,1,0,1,1,1,1)
test <- c(11:15)

mod <- svm(y ~ x, kernel = "linear", gamma = 1, cost = 2, type="C-classification")

predict(mod, newdata = test)

The result is as follows:

> predict(mod, newdata = test)
   1    2    3    4 <NA> <NA> <NA> <NA> <NA> <NA> 
   0    0    0    0    0    1    1    1    1    1

Can anybody explain why predict() only gives the fitted values of the training sample (x,y) and does not care about the test-data?

Thank you very much for your help!

Richard

Longshore answered 16/12, 2010 at 15:0 Comment(1)

ps: using test <- c(11:25) gives "Error in names(ret2) <- rowns : 'names' attribute [15] must be the same length as the vector [10]" – Longshore 16/12, 2010 at 15:5

It looks like this is because you misuse the formula interface to svm(). Normally, one supplies a data frame or similar object within which the variables in the formula are searched for. It usually doesn't matter if you don't do this, even if it is not best practice, but when you want to predict, not putting variables in a data frame gets you in a right mess. The reason it returns the training data is because you don't provide newdata an object with a component named x in it. Hence it can't find the new data x so returns the fitted values. This is common for most R predict methods I know.

The solution then is to i) put your training data in a data frame and pass svm this as the data argument, and ii) supply a new data frame containing x (from test) to predict(). E.g.:

> DF <- data.frame(x = x, y = y)
> mod <- svm(y ~ x, data = DF, kernel = "linear", gamma = 1, cost = 2,
+ type="C-classification")
> predict(mod, newdata = data.frame(x = test))
1 2 3 4 5 
1 1 1 1 1 
Levels: 0 1

Kliman answered 16/12, 2010 at 15:10 Comment(0)

You need newdata to be of the same form, ie using a data.frame helps:

R> library(e1071)
Loading required package: class
R> df <- data.frame(x=1:10, y=sample(c(0,1), 10, rep=TRUE))
R> mod <- svm(y ~ x, kernel = "linear", gamma = 1, 
+             cost = 2, type="C-classification", data=df)
R> newdf <- data.frame(x=11:15)
R> predict(mod, newdata=newdf)
1 2 3 4 5
0 0 0 0 0
Levels: 0 1
R>

By the way, this is also shown the help page for svm():

 ## density-estimation

 # create 2-dim. normal with rho=0:
 X <- data.frame(a = rnorm(1000), b = rnorm(1000))
 attach(X)

 # traditional way:
 m <- svm(X, gamma = 0.1)

 # formula interface:
 m <- svm(~., data = X, gamma = 0.1)
 # or:
 m <- svm(~ a + b, gamma = 0.1)

 # test:
 newdata <- data.frame(a = c(0, 4), b = c(0, 4))
 predict (m, newdata)

So in sum, use the formula interface and supply a data.frame --- that is how essentially all modeling functions in R work.

Harebrained answered 16/12, 2010 at 15:9 Comment(3)

Why are you defining a gamma parameter for a linear svm? Is that standard practice for linear svms in e1071? I have only ever seen those with RBF SVMs. – Eo 19/9, 2019 at 12:7

I will go and find my time machine to ask my younger self why I wrote that nine years ago but form the context I just cited the help page which may or may not have changed in the interim. – Harebrained 19/9, 2019 at 12:25

I didn't realize it was nine years ago sorry. I see the user used gamma as well. I was just curious because my question was why are we defining gamma for a linear svm in general. – Eo 19/9, 2019 at 15:34

Recommended topics

Hot tags