Is it possible to filter
in dplyr
by the position of a column?
I know how to do it without dplyr
iris[iris[,1]>6,]
But how can I do it in dplyr?
Thanks!
Is it possible to filter
in dplyr
by the position of a column?
I know how to do it without dplyr
iris[iris[,1]>6,]
But how can I do it in dplyr?
Thanks!
Besides the suggestion by @thelatemail, you can also use filter_at
and pass the column number to vars
parameter:
iris %>% filter_at(1, all_vars(. > 6))
all(iris %>% filter_at(1, all_vars(. > 6)) == iris[iris[,1] > 6, ])
# [1] TRUE
No magic, just use the item column number as per above, rather than the variable (column) name:
library("dplyr")
iris %>%
filter(iris[,1] > 6)
Which as @eipi10 commented is better as
iris %>%
filter(.[[1]] > 6)
filter(.[,1] > 6)
. It doesn't matter here, but in general if you've changed the initial data frame with other piped functions before the filter, filter(iris[,1] > 6)
will reach outside the pipe to the original data frame, rather than use the piped data frame. –
Hardy iris %>% mutate(Sepal.Length=0) %>% filter(iris[,1] > 6)
vs iris %>% mutate(Sepal.Length=0) %>% filter(.[,1] > 6)
–
Nidanidaros Scoped verbs (_if
, _at
, _all
) and by extension all_vars()
and any_vars()
have been superseded by across()
. In the case of filter
the functions if_any
and if_all
have been created to combine logic across multiple columns to aid in subsetting (these verbs are available in dplyr >= 1.0.4):
if_any() and if_all() are used with to apply the same predicate function to a selection of columns and combine the results into a single logical vector.
The first argument to across
, if_any
, and if_any
is still tidy-select syntax for column selection, which includes selection by column position.
Single Column
In your single column case you could do any with the same result:
iris %>%
filter(across(1, ~ . > 6))
iris %>%
filter(if_any(1, ~ . > 6))
iris %>%
filter(if_all(1, ~ . > 6))
Multiple Columns
If you're apply a predicate function or formula across multiple columns then across
might give unexpected results and in this case you should use if_any
and if_all
:
iris %>%
filter(if_all(c(2, 4), ~ . > 2.3)) # by column position
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 6.3 3.3 6.0 2.5 virginica
2 7.2 3.6 6.1 2.5 virginica
3 5.8 2.8 5.1 2.4 virginica
4 6.3 3.4 5.6 2.4 virginica
5 6.7 3.1 5.6 2.4 virginica
6 6.7 3.3 5.7 2.5 virginica
Notice this returns rows where all selected columns have a value greater than 2.3, which is a subset of rows where any of the selected columns meet the logic:
iris %>%
filter(if_any(ends_with("Width"), ~ . > 2.3)) # same columns selection as above
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 6.7 3.3 5.7 2.5 virginica
7 6.7 3.0 5.2 2.3 virginica
8 6.3 2.5 5.0 1.9 virginica
9 6.5 3.0 5.2 2.0 virginica
10 6.2 3.4 5.4 2.3 virginica
11 5.9 3.0 5.1 1.8 virginica
The output above was shorted to be more compact for this example.
Not very elegant, but you can rename the variable, and use the new name on a dplyr pipe.
iris_copy <- iris
original_names <- names(iris_copy)
Renaming the first variable:
names(iris_copy)[1] <- "col1"
Filtering the first variable:
iris_copy |> filter(col1 > 6)
If you need the original variable name:
names(iris_copy) <- original_names
© 2022 - 2025 — McMap. All rights reserved.
iris %>% filter(select(.,1) > 6)
maybe? – Nidanidarosiris %>% filter(.[[1]] > 6)
– Hardy