Scale all values depending on group [duplicate]
Asked Answered
H

1

7

I have a dataframe similar to this one

ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)

Now I would like to scale the values in p1 and p2 depending on their ID. So not the whole column would be scaled like when using the tapply() function, but rather scaling is done once for all values for ID 1, then for all values for ID 2 etc. Same for scaling of p2. The new dataframe should consist of the scaled values.

I already tried

df_scaled <- ddply(my.df, my.df$ID, scale(my.df$p1))

but get the error message

.fun is not a function.

Thanks for your help!

Hutch answered 20/1, 2017 at 10:11 Comment(0)
Q
5

dplyr makes this easy:

ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)

library(dplyr)
df_scaled <- my.df %>% group_by(ID) %>% mutate(p1 = scale(p1), p2=scale(p2))

Note that there is a bug in the stable version of dplyr when working with scale; you might need to update to the dev version (see comments).

Quenelle answered 20/1, 2017 at 10:14 Comment(8)
or more generic my.df %>% group_by(ID) %>% mutate_at(vars(matches('p')), funs(scale))Aviles
Thank you. It works on the dataframe I put up here as an example but with the real dataframe I get the Error: unexpected '=' in "scaled_data <- predictortable_panel %>% group_by(predictortable_panel$ID) %>% mutate(predictortable_panel$p1 =" --- any idea why it won't take the equal sign?Hutch
You shouldn't repeat the name of the data frame inside the dplyr functions (i.e. remove predictortable_panel$); mutate(p1=... etc should work.Quenelle
@Quenelle running your code gives me an issue. Not an error right away, but if I try to run View(df_scaled) afterwards I get an error: dims [product 5] do not match the length of object [15]. If I add as.vector before each of the scale calls, it fixes the problem (I thought to try because scale outputs a matrix rather than a vector). Not sure if this is universal, I'm on R-3.3.2, dplyr-0.5.0.Undergrowth
I can confirm (by sapply(df_scaled,class)) that the columns are matrices, which I think is what View is struggling with (I thought it would be OK, since list columns usually work fine). Coercing each to vector, either in the mutate call or afterwards resolves it.Undergrowth
Your problem might be related to this, which seems to be a bug: #35776196Quenelle
I was convinced this problem must be in scale rather than dplyr, but tried installing the dev version of dplyr anyway, and you're right; that fixed it. Thanks for clearing it up for me @QuenelleUndergrowth
I edited it into the answer, for future googlers. Feel free to accept the answer if it was helpful :)Quenelle

© 2022 - 2024 — McMap. All rights reserved.