Error faced while using TM package's VCorpus in R
Asked Answered
S

2

8

I am facing the below error while working on the TM package with R.

library("tm")
Loading required package: NLP
Warning messages:
1: package ‘tm’ was built under R version 3.4.2 
2: package ‘NLP’ was built under R version 3.4.1 

corpus <- VCorpus(DataframeSource(data))

Error: all(!is.na(match(c("doc_id", "text"), names(x)))) is not TRUE

Have tried various ways like reinstalling the package, updating with new version of R but the error still persists. For the same data file the same code runs on another system with the same version of R.

Sphenoid answered 21/11, 2017 at 6:27 Comment(0)
W
20

I met the same problem when I updated the tm package to 0.7-2 version. I looked for details of DataframeSource(), it mentioned:

The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text".

Details

A data frame source interprets each row of the data frame x as a document. The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text" and contain a "UTF-8" encoded string representing the document's content. Optional additional columns are used as document level metadata.

I solved it with the following code:

df_cmp<- read.csv("test_file.csv",stringsAsFactors = F)

df_title <- data.frame(doc_id=row.names(df_cmp),
                       text=df_cmp$English.title)

You can try and change the column names to doc_id and text.

Wadesworth answered 29/11, 2017 at 8:2 Comment(0)
B
0

I encountered this error using the BTM package also. As Eva notes, it may relate to your column headings (which must be doc_id and text, respectively). However, in my case it was because my doc_id values had become corrupted and were no longer unique. If the error persists, try examining your doc_id values to ensure they increment properly.

Brownley answered 2/6, 2020 at 5:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.