I will use the following data set to illustrate my questions:
my_df <- data.frame(
a = 1:10,
b = 10:1
)
colnames(my_df) <- c("a", "b")
Part 1
I use the mutate()
function to create two new variables in my data set and I would like to compute the row means of the two new columns inside the same mutate()
call. However, I would really like to be able to use the select()
helpers such as starts_with()
, ends_with()
or contains()
.
My first try:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(ends_with("2")))
)
Error in mutate_impl(.data, dots) :
Evaluation error: No tidyselect variables were registered.
I understand why there is an error - the select()
function is not given any .data
argument. So I change the code in...
... my second try by adding ".
" inside the select()
function:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(., ends_with("2")))
)
a b a_2 b_2 mean
1 1 10 1 100 NaN
2 2 9 4 81 NaN
3 3 8 9 64 NaN
4 4 7 16 49 NaN
5 5 6 25 36 NaN
6 6 5 36 25 NaN
7 7 4 49 16 NaN
8 8 3 64 9 NaN
9 9 2 81 4 NaN
10 10 1 100 1 NaN
The new problem after the second try is that the mean
column does not contain the mean of a_2
and b_2
as expected, but contains NaN
s only. After studying the code a bit, I understood the second problem. The added ".
" in the select()
function refers to the original my_df
data frame, which does not have the a_2
and b_2
columns. So it makes sense that NaN
s are produced because I am asking R
to compute the means of nonexistent values.
I then tried to use dplyr
functions such as current_vars()
to see if it would make a difference:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(current_vars(), ends_with("2")))
)
Error in mutate_impl(.data, dots) :
Evaluation error: Variable context not set.
However, this is obviously NOT the way to use this function. The solution is to simply add a second mutate()
function:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2
) %>%
mutate(mean = rowMeans(select(., ends_with("2"))))
a b a_2 b_2 mean
1 1 10 1 100 50.5
2 2 9 4 81 42.5
3 3 8 9 64 36.5
4 4 7 16 49 32.5
5 5 6 25 36 30.5
6 6 5 36 25 30.5
7 7 4 49 16 32.5
8 8 3 64 9 36.5
9 9 2 81 4 42.5
10 10 1 100 1 50.5
Question 1: Is there any way to perform this task in the same mutate()
call? Using a second mutate()
function is not really an issue anyway; however, I am curious to know if there exists a way to refer to currently existing variables. The mutate()
function allows for the usage of variables as soon as they are created inside the same mutate()
call; however, this becomes problematic when functions are nested as shown in my example above.
Part 2
I also realize that using rowMeans()
works in my solution; however, it is not really a dplyr
-way of doing things especially because I need to use select()
inside it. So, I decided to use the rowwise()
and mean()
functions instead. But once again, I would like to use one of the select()
helpers for that and not have to list all variables in a c()
function. I tried:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2
) %>%
rowwise() %>%
mutate(
mean = mean(ends_with("2"))
)
Error in mutate_impl(.data, dots) :
Evaluation error: No tidyselect variables were registered.
I suspect that the error in the code is due to the fact that ends_with()
is not inside select()
, but I am showing this to ask whether there is a way to list the variables I want without having to specify them individually.
Thank you for your time.
my_df %>% mutate(a_2 = a^2, b_2 = b^2) %>% rowwise()%>% select(. , ends_with("2"))
is the object that you want to runmeans()
upon, but this will never work becauserowMeans()
is designed to work horizontally whilemeans()
is not. – Murrainmeans()
function belong to? And yes, I specified in the question that I am trying to compute horizontal means. This is why I usedrowMeans()
in the first part and a combination ofrowwise()
andmean()
in the second part. – Boarfishmean()
won't operate that way you intend it to. I was "referencing #1" because it seemed worthy of a bounty. Likely, we'll need Hadley (or someone very proficient here) to answer it :) – Murrainrowwise()
andmean()
; however, you need to manually specify column names in ac()
function. I was just wondering if there existed a way to use one of the select helpers to perform the same task. – Boarfish