I am using the train function in caret to train a SVM using the svmRadial kernel for a binary classification task I have.
When I run the train function on my data, I incrementally get these messages which say
line search fails -2.13865 -0.1759025 1.01927e-05 3.812143e-06 -5.240749e-08 -1.810113e-08 -6.03178e-13line search fails -0.7148131 0.1612894 2.32937e-05 3.518543e-06 -1.821269e-08 -1.37704e-08 -4.726926e-13
Once the code is finished (compiling/running?) I received the following warnings also:
> warnings()
Warning messages:
1: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
2: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
3: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
4: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
5: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
6: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
7: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
8: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
9: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
10: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
11: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
12: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
13: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
14: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
15: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
16: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
17: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
18: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
19: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
20: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
21: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
22: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
23: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
24: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
25: In method$predict(modelFit = modelFit, newdata = newdata, ... :
kernlab class prediction calculations failed; returning NAs
26: In method$prob(modelFit = modelFit, newdata = newdata, ... :
kernlab class probability calculations failed; returning NAs
27: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
28: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... :
There were missing values in resampled performance measures.
As you can see from the warnings above there's mention of NA values for some probability calculations, why would these calculations be failing?
As per @HFBrowning request here is an example of the data I am using. I am trying to build a classifier to predict whether a telecommunications cell is either overshooting or not.overshooting (Class).
> head(imbal_training,10)
Total.Tx.Height Antenna.Tilt Antenna.Gain Ant.Vert.Beamwidth RTWP Voice.Drops Range Max.Distance Rural Suburban Urban
2 31.25 0 15.9 10.0 -103.55396 12 5.14 6.24 1 0 0
5 31.25 0 18.2 4.4 -104.76192 1 3.88 4.98 1 0 0
7 25.14 4 15.9 9.6 -102.93839 1 6.58 9.17 1 0 0
9 25.14 2 18.8 4.3 -104.23198 4 5.08 7.67 1 0 0
11 10.66 4 16.2 10.0 -98.23691 17 23.33 24.69 0 1 0
12 10.66 6 16.2 10.0 -103.78522 5 18.24 19.60 0 1 0
13 10.66 5 16.2 10.0 -94.59940 5 20.20 21.56 0 1 0
14 10.66 3 18.7 4.4 -103.17622 3 23.86 25.22 0 1 0
15 10.66 5 18.7 4.4 -104.97827 0 23.86 25.22 0 1 0
16 10.66 4 18.8 4.4 -105.78948 1 23.86 25.22 0 1 0
Class HSUPA.Throughput Max.HSDPA.Users HS.DSCH.throughput Max.HSUPA.Users Avg.CQI
2 Not.Overshooting 222.62 16 2345.54 25 17.99
5 Overshooting 263.83 8 3894.07 13 21.82
7 Overshooting 392.66 14 5134.80 15 23.00
9 Overshooting 478.58 8 7203.39 8 24.70
11 Overshooting 173.21 11 2429.06 15 23.51
12 Overshooting 210.61 16 2694.93 20 19.76
13 Overshooting 205.81 11 3278.06 13 22.10
14 Overshooting 394.10 10 3881.88 13 25.01
15 Overshooting 371.71 10 3765.10 13 23.33
16 Overshooting 321.32 6 4422.15 8 24.85
Here is the code for my train control:
#run the algorithms using 10 fold cross validation
set.seed(123)
train_Control <- trainControl(method = "repeatedCV",
number = 10,
repeats = 3,
savePredictions = T,
classProbs = T, #required for the ROC curve calcs
summaryFunction = twoClassSummary) #uses AUC to pick the best model
And here is my train function:
#uses the rose_training dataset with a kernel model
set.seed(123)
fit.rose.Kernel <- train(Class ~ Total.Tx.Height +
Antenna.Tilt +
Antenna.Gain +
Ant.Vert.Beamwidth +
RTWP +
Voice.Drops +
Range +
Max.Distance +
Rural +
Suburban +
Urban +
HSUPA.Throughput +
Max.HSDPA.Users +
HS.DSCH.throughput +
Max.HSUPA.Users +
Avg.CQI,
data = rose_train,
method = 'svmRadial',
preProcess = c('center','scale'),
trControl=train_Control,
tuneLength=15,
metric = "ROC")
In order to better understand which section of the code was causing problems I cleared all the existing warnings and ran each model piece by piece to see where it was flagging.
Initially I flagged lines 444 to 469 as the problematic section but today this part ran without any warnings. Now the next few lines are spitting up the same warnings as the previous day but nothing has changed expect clearing the warnings.
In summary I have 2 types of models I am trying to compare, linear SVM using svmLinear and a kernel model using smvRadial.
For both models I am using different configurations of training data since my original dataset was heavily imbalanced to "overshooting" (~80/20). I used the original unbalanced data, then I down sampled, up sampled, used SMOTE and ROSE to generate synthetic data to train both the linear and kernel models using each type of training set.
Does anyone know what these line search fails and warnings are referring to?
In order to provide a reproducible example, here is a link to a copy of my code and here is the dput version of the dataset I am using. The part of the code that is causing these messages and warnings starts at line 444.
If anyone can provide some help I would be very grateful.