Read csv file in R with currency column as numeric

B

6

13

I'm trying to read into R a csv file that contains information on political contributions. From what I understand, the columns by default are imported as factors, but I need the the amount column ('CTRIB_AMT' in the dataset) to be imported as a numeric column so I can run a variety of functions that wouldn't work for factors. The column is formatted as a currency with a "$" as prefix.

I used a simple read command to import the file initially:

contribs <- read.csv('path/to/file')

And then tried to convert the CTRIB_AMT from currency to numeric:

as.numeric(as.character(sub("$","",contribs$CTRIB_AMT, fixed=TRUE)))

But that didn't work. The functions I'm trying to use for the CTRIB_AMT columns are:

vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)

See related question here.

Any thoughts on how to import file initially so column is numeric or how to convert it after importing?

Blubbery answered 7/9, 2011 at 17:11 Comment(3)

A small example of the file contents would be helpful. – Hooked 7/9, 2011 at 17:16

The sub -> as.character -> as.numeric conversion should do the trick. What do you mean by "didn't work"? – Clavus 7/9, 2011 at 17:25

Works for me. CTRIB_AMT <- factor(c("$5000","$2500","$100")); as.numeric(as.character(sub("$","",CTRIB_AMT, fixed=TRUE))) – Aforementioned 7/9, 2011 at 17:44

G

18

I'm not sure how to read it in directly, but you can modify it once it's in:

> A <- read.csv("~/Desktop/data.csv")
> A
  id   desc price
1  0  apple $1.00
2  1 banana $2.25
3  2 grapes $1.97
> A$price <- as.numeric(sub("\\$","", A$price))
> A
  id   desc price
1  0  apple  1.00
2  1 banana  2.25
3  2 grapes  1.97
> str(A)
'data.frame':   3 obs. of  3 variables:
 $ id   : int  0 1 2
 $ desc : Factor w/ 3 levels "apple","banana",..: 1 2 3
 $ price: num  1 2.25 1.97

I think it might just have been a missing escape in your sub. $ indicates the end of a line in regular expressions. \$ is a dollar sign. But then you have to escape the escape...

Granadilla answered 7/9, 2011 at 17:48 Comment(3)

I initially thought there was a missing escape too, but fixed=TRUE takes care of that ... – Aforementioned 7/9, 2011 at 17:53

+1 I also found gsub( "[$,]", "", . . . useful for figures like "$1,234". – Celluloid 9/4, 2013 at 1:18

This should not be the accepted solution; it is inferior to other solutions which do the substitution at read-time, at read-time, return numeric, and thus avoid creating lots of unnecessary and unique strings, which could waste Mb/Gb on a large file, and also be slower. – Solferino 21/2, 2019 at 0:25

R

14

Another way could be setting conversion using setAs.
It was used in two (similar) question:

For your needs:

setClass("Currency")
setAs("character", "Currency",
    function(from) as.numeric(sub("$","",from, fixed=TRUE)))

contribs <- read.csv("path/to/file", colClasses=c(CTRIB_AMT="Currency"))

Replication answered 8/9, 2011 at 10:43 Comment(0)

R

5

Yet another solution for a problem solved long time ago:

convertCurrency <- function(currency) {
  currency1 <- sub('$','',as.character(currency),fixed=TRUE)
  currency2 <- as.numeric(gsub('\\,','',as.character(currency1))) 
  currency2
}

contribs$CTRIB_AMT_NUM <- convertCurrency(contribs$CTRIB_AMT)

Responsibility answered 7/4, 2013 at 19:30 Comment(0)

I

4

Taking advantage of the powerful parsers the readr package offers out of the box:

my_parser <- function(col) {
  # Try first with parse_number that handles currencies automatically quite well
  res <- suppressWarnings(readr::parse_number(col))
  if (is.null(attr(res, "problems", exact = TRUE))) {
    res
  } else {
    # If parse_number fails, fall back on parse_guess
    readr::parse_guess(col)
    # Alternatively, we could simply return col without further parsing attempt
  }
}

library(dplyr)

name <- c('john','carl', 'hank')
salary <- c('$23,456.33','$45,677.43','$76,234.88')
emp_data <- data.frame(name,salary)

emp_data %>% 
  mutate(foo = "USD13.4",
         bar = "£37") %>% 
  mutate_all(my_parser)

#   name   salary  foo bar
# 1 john 23456.33 13.4  37
# 2 carl 45677.43 13.4  37
# 3 hank 76234.88 13.4  37

Ingvar answered 16/7, 2018 at 15:17 Comment(0)

O

2

Or use something like as.numeric(substr(as.character(contribs$CTRIB_AMT),2,20)) we know that there certainly won't be more than 20 characters.

Another thing to note is that you can remove the need to convert from a factor alltogether if you set stringsAsFactors=F in your call to read.csv()

Ostracod answered 7/9, 2011 at 17:28 Comment(0)

A

0

A more modern answer now perhaps:

read_csv from the readr package has a col_number() parser that can deal with comma separators and currency symbols.

Just used it to parse some stock data downloaded from NASDAQ:

# Downloaded from https://www.nasdaq.com/market-activity/stocks/amzn/historical
AMZN <- 
  read_csv("data/AMZN_HistoricalData_1640763915240.csv", 
    col_types = cols(
      Date = col_date(format = "%m/%d/%Y"),
      `Close/Last` = col_number(),
      Volume = col_integer(),
      Open = col_number(),
      High = col_number(),
      Low = col_number())
  )

Archoplasm answered 29/12, 2021 at 8:38 Comment(0)

Recommended topics

Hot tags