Create_Analytics in RTextTools

Asked 9/5, 2014 at 9:40 Answered 4/2, 2016 at 5:47

r precision text-mining document-classification confusion-matrix

I trying to classify Text documents into number of categories. My below code works fine

matrix[[i]] <- create_matrix(trainingdata[[i]][,1], language="english",removeNumbers=FALSE,stemWords=FALSE,weighting=weightTf,minWordLength=3)                              
container[[i]] <- create_container(matrix[[i]],trainingdata[[i]][,2],trainSize=1:50,testSize=51:100) ,
models[[i]] <- train_models(container[[i]], algorithms=c("MAXENT","SVM"))
results[[i]] = classify_models(container[[i]],models[[i]])

When i try to the below code to get Precision, recall, accuracy values:

analytic[[i]]  <- create_analytics(container[[i]], results[[i]])

I get the following error:

Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA_real_, NA_real_ : 
  duplicate 'row.names' are not allowed

My Categories are in text format. If i convert those categories into Numeric - the above code works fine.

Is there a work around to keep the categories in text format and get Precision, recall, accuracy values.

My objective is to get Precision, recall, accuracy values and Confusion matrix for multi-class classifier. Is there any other package to get the above values for Multi- Class Text classifier (one vs. all )

Estradiol answered 9/5, 2014 at 9:40 Comment(4)

can you try if factor(Categories) works for you – Rickrack 10/5, 2014 at 7:18

Is there any work around , to do this – Estradiol 19/5, 2014 at 9:15

I have the same error. in RTextTools::create_analytics there is a a local function called create_Topic summary, which has testing_codes <- as.numeric(as.vector(container@testing_codes)) This causes NA to fill testing_codes. Still looking into it. – Supposition 10/7, 2014 at 21:8

Ok I got it to work. I had to 1. convert it to a factor. Then convert the factor to a number. I guess it is like "Class 1" -> 1, "Class 2"->2. If you look at the example in RTextTools: A supervised Learning Package for Text Classification, the example uses USCongress$major as a class label. That happened to be an integer. – Supposition 11/7, 2014 at 13:48

As user3294343 commented, it worked for me converting my class field to a factor, and then to numeric, as follows:

doc_matrix <- create_matrix(dataset.arff$text, language="english", removeNumbers=TRUE, stemWords=TRUE, removeSparseTerms=.998)
container <- create_container(doc_matrix, as.numeric(factor(dataset.arff$"@@class@@")), trainSize=1:1500, testSize=1501:1999, virgin=FALSE)

That solved the error for me.

Rilke answered 16/12, 2014 at 0:38 Comment(0)

Above mentioned trick worked for me, convert to factors

matrix <- create_matrix(combinedDF["error"], language="english", 
                    removeNumbers=TRUE, stemWords=FALSE, weighting=tm::weightTfIdf)
len <- dim(combinedDF)[1]
container <- create_container(matrix,as.numeric(factor(combinedDF$class)),trainSize=1:len, testSize=1:len, 
                          virgin=TRUE)
maxent_model <- train_model(container,"MAXENT")
maxent_results <- classify_model(container,maxent_model)
analytics <- create_analytics(container, maxent_results, b=1)

Clique answered 4/2, 2016 at 5:47 Comment(1)

can you please tell what is the difference between your answer and the answer from dsg? – Newmarket 10/1, 2017 at 11:51

Recommended topics

Hot tags