Mutate multiple columns in a dataframe
Asked Answered
W

6

19

I have a data set that looks like this.

bankname    bankid   year    totass    cash    bond    loans
Bank A      1        1881    244789    7250    20218   29513
Bank B      2        1881    195755    10243   185151  2800
Bank C      3        1881    107736    13357   177612  NA
Bank D      4        1881    170600    35000   20000   5000
Bank E      5        1881    3200000   351266  314012  NA

and I want to compute some ratios based on bank balance sheets. and I want the dataset to look like this

bankname    bankid   year    totass    cash    bond    loans    CashtoAsset   BondtoAsset    LoanstoAsset
Bank A      1        1881    2447890   7250    202100  951300   0.002         0.082          0.388
Bank B      2        1881    195755    10243   185151  2800     0.052         0.945          0.014
Bank C      3        1881    107736    13357   177612  NA       0.123         1.648585431    NA
Bank D      4        1881    170600    35000   20000   5000     0.205         0.117          0.029
Bank E      5        1881    32000000  351266  314012  NA       0.0109        0.009          NA

Here is the code to replicate the data

bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2,  3,  4,  5)
year<- c( 1881, 1881,   1881,   1881,   1881)
totass  <- c(244789,    195755, 107736, 170600, 32000000)
cash<-c(7250,10243,13357,35000,351266)
bond<-c(20218,185151,177612,20000,314012)
loans<-c(29513,2800,NA,5000,NA)
bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)

First, I got rid of NAs in balance sheets.

cols <- c("totass", "cash", "bond", "loans")
bankdata[cols][is.na(bankdata[cols])] <- 0

Then I compute ratios

library(dplyr)
bankdata<-mutate(bankdata,CashtoAsset = cash/totass)
bankdata<-mutate(bankdata,BondtoAsset = bond/totass)
bankdata<-mutate(bankdata,loanstoAsset =loans/totass)

But, instead of computing all these ratios line by line, I want to create a look to do this all at once. In Stata, I would do

foreach x of varlist cash bond loans {
by bankid: gen `x'toAsset = `x'/ totass
}

How would I do this?

Wetmore answered 6/10, 2014 at 15:24 Comment(5)
Meta-comments: When translating from one language to another, you need not be too literal. Loops in Stata often work better as array-based calculations in R. (Even the converse can be true: newcomers to Stata from other languages often try loops over observations, which are rarely needed.)Butane
I simplify my variables here, but in my dataset, I have over 20 categories of assets, so having a loop is helpful.Wetmore
I have nothing against loops; similarly the typical R user is surely happy with 20 columns....Butane
@NickCox As always, Thank you for your help. I am learning a lot from you.Wetmore
@akrun Thanks for letting me know. I will do that from now on.Wetmore
A
50

Update (as of the 18th of March, 2019)

There has been a change. We have been using funs() in .funs (funs(name = f(.)). But this is changed (dplyr 0.8.0 above). Instead of funs, now we use list (list(name = ~f(.))). See the following new examples.

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = vars(cash:loans))

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = c("cash", "bond", "loans"))

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = 5:7)

Update (as of the 2nd of December, 2017)

Since I answered this question, I have realized that some SO users have been checking this answer. The dplyr package has changed since then. Therefore, I leave the following update. I hope this will help some R users to learn how to use mutate_at().

mutate_each() is now deprecated. You want to use mutate_at(), instead. You can specify which columns you want to apply your function in .vars. One way is to use vars(). Another is to use a character vector containing column names, which you want to apply your custom function in .fun. The other is to specify columns with numbers (e.g., 5:7 in this case). Note that, if you use a column for group_by(), you need to change the numbers of column positions. Have a look of this question.

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans))

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans"))

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7)

#  bankname bankid year   totass   cash   bond loans cash_toAsset bond_toAsset loans_toAsset
#1   Bank A      1 1881   244789   7250  20218 29513   0.02961734  0.082593581    0.12056506
#2   Bank B      2 1881   195755  10243 185151  2800   0.05232561  0.945830247    0.01430359
#3   Bank C      3 1881   107736  13357 177612    NA   0.12397899  1.648585431            NA
#4   Bank D      4 1881   170600  35000  20000  5000   0.20515826  0.117233294    0.02930832
#5   Bank E      5 1881 32000000 351266 314012    NA   0.01097706  0.009812875            NA

I purposely gave toAsset to the custom function in .fun since this will help me to arrange new column names. Previously, I used rename(). But I think it is much easier to clean up column names with gsub() in the present approach. If the above result is saved as out, you want to run the following code in order to remove _ in the column names.

names(out) <- gsub(names(out), pattern = "_", replacement = "")

Original answer

I think you can save some typing in this way with dplyr. The downside is you overwrite cash, bond, and loans.

bankdata %>%
    group_by(bankname) %>%
    mutate_each(funs(whatever = ./totass), cash:loans)

#  bankname bankid year   totass       cash        bond      loans
#1   Bank A      1 1881   244789 0.02961734 0.082593581 0.12056506
#2   Bank B      2 1881   195755 0.05232561 0.945830247 0.01430359
#3   Bank C      3 1881   107736 0.12397899 1.648585431         NA
#4   Bank D      4 1881   170600 0.20515826 0.117233294 0.02930832
#5   Bank E      5 1881 32000000 0.01097706 0.009812875         NA

If you prefer your expected outcome, I think some typing is necessary. The renaming part seems to be something you gotta do.

bankdata %>%
    group_by(bankname) %>%
    summarise_each(funs(whatever = ./totass), cash:loans) %>%
    rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana;
    ana %>%
    merge(bankdata,., by = "bankname")

#  bankname bankid year   totass   cash   bond loans cashtoAsset bondtoAsset loanstoAsset
#1   Bank A      1 1881   244789   7250  20218 29513  0.02961734 0.082593581   0.12056506
#2   Bank B      2 1881   195755  10243 185151  2800  0.05232561 0.945830247   0.01430359
#3   Bank C      3 1881   107736  13357 177612    NA  0.12397899 1.648585431           NA
#4   Bank D      4 1881   170600  35000  20000  5000  0.20515826 0.117233294   0.02930832
#5   Bank E      5 1881 32000000 351266 314012    NA  0.01097706 0.009812875           NA
Addiction answered 6/10, 2014 at 15:50 Comment(3)
Hi, I am trying all different options posted here. When I tried your codes, I got. `Error: object 'ana' not found. Would you explain to me what is going on? Thanks.Wetmore
@HPark I am assigning an output to the object, ana in the piping process. If this approach does not work for you, you could do, ana <- bankdata %>% group_by(bankname) %>% summarise_each(funs(whatever = ./totass), cash:loans) %>% rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans); ana %>% merge(bankdata,., by = "bank name")Addiction
Reminder from 2022: all of the above have been deprecated; the state of the art would be dplyr::mutate(dplyr::across(.cols = c(cash:loans), .fns=~.x/totass, .names="{.col}ToAsset")). Notable changes: there is no need to user vars() anymore, a simple c() will do; and there is no need to overwrite columns anymore.Vardhamana
M
4

Apply and cbind

cbind(bankdata,apply(bankdata[,5:7],2, function(x) x/bankdata$totass))
names(bankdata)[8:10] <- paste0(names(bankdata)[5:7], 'toAssest’)

> bankdata
  bankname bankid year   totass   cash   bond loans cashtoAssest bondtoAssest loanstoAssest
1   Bank A      1 1881   244789   7250  20218 29513   0.02961734  0.082593581    0.12056506
2   Bank B      2 1881   195755  10243 185151  2800   0.05232561  0.945830247    0.01430359
3   Bank C      3 1881   107736  13357 177612    NA   0.12397899  1.648585431            NA
4   Bank D      4 1881   170600  35000  20000  5000   0.20515826  0.117233294    0.02930832
5   Bank E      5 1881 32000000 351266 314012    NA   0.01097706  0.009812875            NA
Marvelmarvella answered 6/10, 2014 at 17:33 Comment(2)
In the code cbind(bankdata,apply(bankdata[,5:7],2, function(x) x/bankdata$totass)), what does the 2 signify?Frisky
@Skurup, "2" is the "margin" argument within the apply function. It means that the function will be applied to the columns vector. "1" instead would apply the function to rows. ( check ?apply)Marvelmarvella
G
3

Here is a data.table solution.

library(data.table)
setDT(bankdata)
bankdata[, paste0(names(bankdata)[5:7], "toAsset") := 
           lapply(.SD, function(x) x/totass), .SDcols=5:7]
bankdata
#    bankname bankid year   totass   cash   bond loans cashtoAsset bondtoAsset loanstoAsset
# 1:   Bank A      1 1881   244789   7250  20218 29513  0.02961734 0.082593581   0.12056506
# 2:   Bank B      2 1881   195755  10243 185151  2800  0.05232561 0.945830247   0.01430359
# 3:   Bank C      3 1881   107736  13357 177612     0  0.12397899 1.648585431   0.00000000
# 4:   Bank D      4 1881   170600  35000  20000  5000  0.20515826 0.117233294   0.02930832
# 5:   Bank E      5 1881 32000000 351266 314012     0  0.01097706 0.009812875   0.00000000
Grappa answered 6/10, 2014 at 16:48 Comment(0)
G
1

This is one of the big downsides of dplyr: as far as I'm aware, there is no straightforward way to use it programmatically rather than interactively without some kind of "hack" like the deplorable eval(parse(text=foo)) idiom.

The simplest approach is the same as in the Stata method, but string manipulation is a little more verbose in R than in Stata (or in any other scripting language, for that matter).

for (x in c("cash", "bond", "loans")) {
  bankdata[sprintf("%stoAsset", x)] <- bankdata[x] / bankdata$totass  # or, equivalently, bankdata["totass"] for a consistent "look"
  ## can also replace `sprintf("%stoAsset", x)` with `paste0(c(x, "toAsset"))` or even `paste(x, "toAsset", collapse="") depending on what makes more sense to you.
}

To make the whole thing more Stata-like, you can wrap the whole thing in within like so:

bankdata <- within(bankdata, for (x in c("cash", "bond", "loans")) {
  assign(x, get(x) / totass)
})

but this entails some hacking with the get and assign functions which aren't as safe to use in general, although in your case it's probably not a big deal. I wouldn't recommend trying similar tricks with dplyr, for instance, because dplyr abuses R's nonstandard evaluation features and it's probably more trouble than it's worth. For a faster and probably superior solution, check out the data.table package which (I think) would allow you to use the Stata-like looping syntax but with dplyr-like speed. Check out the package vignette on CRAN.

Also, are you really, really sure you want to reassign NA entries to 0?

Gantrisin answered 6/10, 2014 at 15:50 Comment(0)
T
0

You might be making this a little harder than necessary. Just try this and see if it yields what you need.

bankdata$CashtoAsset <- bankdata$cash / bankdata$totass
bankdata$BondtoAsset <- bankdata$bond / bankdata$totass
bankdata$loantoAsset <- bankdata$loans / bankdata$totass
bankdata

Yields this:

    bankname bankid year   totass   cash   bond   loans  CashtoAsset BondtoAsset loantoAsset
1   Bank A    1     1881     244789   7250  20218 29513  0.02961734  0.082593581 0.12056506
2   Bank B    2     1881     195755  10243 185151  2800  0.05232561  0.945830247 0.01430359
3   Bank C    3     1881     107736  13357 177612     0  0.12397899  1.648585431 0.00000
4   Bank D    4     1881     170600  35000  20000  5000  0.20515826  0.117233294 0.02930832
5   Bank E    5     1881   32000000 351266 314012     0  0.01097706  0.009812875 0.00000000

This should get you started in the right direction.

Tele answered 6/10, 2014 at 15:46 Comment(0)
M
0

Try:

for(i in 5:7){
     bankdata[,(i+3)] = bankdata[,i]/bankdata[,4]
}
names(bankdata)[(5:7)+3] =  paste0(names(bankdata)[5:7], 'toAssest')

Output:

bankdata
  bankname bankid year   totass   cash   bond loans cashtoAssest bondtoAssest loanstoAssest
1   Bank A      1 1881   244789   7250  20218 29513   0.02961734  0.082593581    0.12056506
2   Bank B      2 1881   195755  10243 185151  2800   0.05232561  0.945830247    0.01430359
3   Bank C      3 1881   107736  13357 177612     0   0.12397899  1.648585431    0.00000000
4   Bank D      4 1881   170600  35000  20000  5000   0.20515826  0.117233294    0.02930832
5   Bank E      5 1881 32000000 351266 314012     0   0.01097706  0.009812875    0.00000000
Mufinella answered 6/10, 2014 at 15:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.