Error in ConfusionMatrix the data and reference factors must have the same number of levels
Asked Answered
T

11

26

I've trained a tree model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error:

Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels

prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
                                   times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
                 trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)

The error occurs when generating the confusion matrix. The levels are the same on both objects. I cant figure out what the problem is. Their structure and levels are given below. They should be the same. Any help would be greatly appreciated as its making me cracked!!

> str(predictionsTree)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...

> levels(predictionsTree)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"   

> levels(testdata$category)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"       
Thankyou answered 17/7, 2014 at 10:44 Comment(3)
In your error, category is spelled catgeory. If the problem is not related, what's the output of identical(levels(predictionsTree),levels(testdata$category)) ?Gizmo
Hi thanks for that i ammended the silly spelling mistake....doh!!! I ran the identical function and it outputted [1] TRUE.........now I'm getting the following error when I run the confusionMatrix function.....Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same lengthThankyou
Check for another misspelled catgeory, check length(testdata$category) and length(predictionsTree, check also your summary of both vectors. If you want just have a simple confusion matrix : table(predictionsTree,testdata$category)Gizmo
S
24

Try use:

confusionMatrix(table(Argument 1, Argument 2)) 

Thats worked for me.

Superpower answered 3/6, 2018 at 11:14 Comment(0)
F
5

Maybe your model is not predicting a certain factor. Use the table() function instead of confusionMatrix() to see if that is the problem.

Franciskus answered 31/10, 2014 at 5:36 Comment(3)
You can add this as comment.Physostomous
I found this very helpful, yet now I am wondering, it doesn't seem like there is much of a difference between the two. Is it only graphical?Kincardine
If this is the case, then, how can we fix or work it around gracefully?Gracioso
S
4

Try specifying na.pass for the na.action option:

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)
Subrogate answered 12/11, 2015 at 3:2 Comment(0)
C
2

Change them into a data frame and then use them in confusionMatrix function:

pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)

my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)

# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))

confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))
Contradance answered 9/8, 2018 at 5:46 Comment(0)
E
1

I had same issue but went ahead and changed it after reading data file like so..

data = na.omit(data)

Thanks all for pointer!

Ezaria answered 21/11, 2015 at 18:54 Comment(0)
W
0

Might be there are missing values in the testdata, Add the following line before "predictionsTree <- predict(treeFit, testdata)" to remove NAs. I had the same error and now it works for me.

testdata <- testdata[complete.cases(testdata),]
Wethington answered 11/1, 2015 at 7:12 Comment(0)
D
0

The length problem you're running into is probably due to the presence of NAs in the training set -- either drop the cases that are not complete, or impute so that you do not have missing values.

Dyane answered 21/5, 2015 at 21:6 Comment(0)
N
0

make sure you installed the package with all its dependencies:

install.packages('caret', dependencies = TRUE)

confusionMatrix( table(prediction, true_value) )
Nordic answered 24/6, 2019 at 19:58 Comment(0)
J
0

If your data contains NAs then sometimes it will be considered as a factor level,So omit these NAs initially

DF = na.omit(DF)

Then,if your model fit is predicting some incorrect level,then it is better to use tables

confusionMatrix(table(Arg1, Arg2))
Judkins answered 17/7, 2019 at 13:3 Comment(0)
C
0

I just ran into the same problem, I solved it by using R ordered factor data type.

levels <- levels(predictionsTree)
levels <- levels[order(levels)]    
table(ordered(predictionsTree,levels), ordered(testdata$catgeory, levels))
Crisscross answered 8/12, 2020 at 19:6 Comment(0)
T
0

Look at the data type! My issue was that data had type int and reference had num. They need the same type.

Tandem answered 13/6, 2023 at 9:52 Comment(1)
Your answer could be improved by sharing the an example with code of how to solve it. Generally, try to be as specific as possible.Alsace

© 2022 - 2025 — McMap. All rights reserved.