Scale all values depending on group [duplicate]

About

Asked 20/1, 2017 at 10:11 Answered 20/1, 2017 at 10:14

I have a dataframe similar to this one

ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)

Now I would like to scale the values in p1 and p2 depending on their ID. So not the whole column would be scaled like when using the tapply() function, but rather scaling is done once for all values for ID 1, then for all values for ID 2 etc. Same for scaling of p2. The new dataframe should consist of the scaled values.

I already tried

df_scaled <- ddply(my.df, my.df$ID, scale(my.df$p1))

but get the error message

.fun is not a function.

Thanks for your help!

Hutch answered 20/1, 2017 at 10:11 Comment(0)

dplyr makes this easy:

ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)

library(dplyr)
df_scaled <- my.df %>% group_by(ID) %>% mutate(p1 = scale(p1), p2=scale(p2))

Note that there is a bug in the stable version of dplyr when working with scale; you might need to update to the dev version (see comments).

Quenelle answered 20/1, 2017 at 10:14 Comment(8)

or more generic my.df %>% group_by(ID) %>% mutate_at(vars(matches('p')), funs(scale)) – Aviles 20/1, 2017 at 10:17

Thank you. It works on the dataframe I put up here as an example but with the real dataframe I get the Error: unexpected '=' in "scaled_data <- predictortable_panel %>% group_by(predictortable_panel$ID) %>% mutate(predictortable_panel$p1 =" --- any idea why it won't take the equal sign? – Hutch 20/1, 2017 at 10:30

You shouldn't repeat the name of the data frame inside the dplyr functions (i.e. remove predictortable_panel$); mutate(p1=... etc should work. – Quenelle 20/1, 2017 at 10:54

@Quenelle running your code gives me an issue. Not an error right away, but if I try to run View(df_scaled) afterwards I get an error: dims [product 5] do not match the length of object [15]. If I add as.vector before each of the scale calls, it fixes the problem (I thought to try because scale outputs a matrix rather than a vector). Not sure if this is universal, I'm on R-3.3.2, dplyr-0.5.0. – Undergrowth 20/1, 2017 at 11:40

I can confirm (by sapply(df_scaled,class)) that the columns are matrices, which I think is what View is struggling with (I thought it would be OK, since list columns usually work fine). Coercing each to vector, either in the mutate call or afterwards resolves it. – Undergrowth 20/1, 2017 at 11:46

Your problem might be related to this, which seems to be a bug: #35776196 – Quenelle 20/1, 2017 at 11:48

I was convinced this problem must be in scale rather than dplyr, but tried installing the dev version of dplyr anyway, and you're right; that fixed it. Thanks for clearing it up for me @Quenelle – Undergrowth 20/1, 2017 at 12:33

I edited it into the answer, for future googlers. Feel free to accept the answer if it was helpful :) – Quenelle 20/1, 2017 at 15:30

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags