Error converting text to lowercase with tm_map(..., tolower)
Asked Answered
O

4

48

I tried using the tm_map. It gave the following error. How can I get around this?

 require(tm)
 byword<-tm_map(byword, tolower)

Error in UseMethod("tm_map", x) : 
  no applicable method for 'tm_map' applied to an object of class "character"
Ozan answered 30/11, 2012 at 6:35 Comment(2)
What package is tm_map from? This seems to be dependent on some non-base package. Please consider including library statement for completeness.Ta
@DanielKrizian : tm_map() is from tm package, and tolower() is from baseCramped
E
103

Use the base R function tolower():

tolower(c("THE quick BROWN fox"))
# [1] "the quick brown fox"
Electrolyte answered 30/11, 2012 at 6:41 Comment(2)
Thanks. But any insight as to why I got that error? I might need to use other tm_map applications!Ozan
The help file for tm_map (in the package tm) shows a list of usable transformation functions and tolower is not one of them. The transformations appear to be S3 methods that operate on objects of class 'Corpus'. So you can't use just any function with tm_map.Electrolyte
C
6

Expanding my comment to a more detailed answer here: you have to wrap tolower inside of content_transformer not to screw up the VCorpus object -- something like:

> library(tm)
> data('crude')
> crude[[1]]$content
[1] "Diamond Shamrock Corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n    The reduction brings its posted price for West Texas\nIntermediate to 16.00 dlrs a barrel, the copany said.\n    \"The price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n    Diamond is the latest in a line of U.S. oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n Reuter"
> tm_map(crude, content_transformer(tolower))[[1]]$content
[1] "diamond shamrock corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n    the reduction brings its posted price for west texas\nintermediate to 16.00 dlrs a barrel, the copany said.\n    \"the price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n    diamond is the latest in a line of u.s. oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n reuter"
Chemash answered 20/7, 2016 at 8:7 Comment(0)
M
3
myCorpus <- Corpus(VectorSource(byword))
myCorpus <- tm_map(myCorpus , tolower)

print(myCorpus[[1]])
Mckenna answered 25/7, 2013 at 17:10 Comment(3)
You should wrap tolower inside of content_transformer not to screw up the VCorpus object, like: tm_map(myCorpus, content_transformer(tolower))Chemash
@daroczig: Please make that an answer!Cramped
@Cramped thanks for the idea, I've just submitted the above comment as a new answer below :)Chemash
S
1

using tolower in this way has an undesirable side effect: if you try to create a term document matrix out of the corpus later, it will fail. This is because of a recent change in tm that cannot handle the return type of tolower. Instead, use:

myCorpus <- tm_map(myCorpus, PlainTextDocument)
Sheerness answered 25/6, 2015 at 19:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.