dplyr filter by the first column

Asked 25/9, 2017 at 3:22 Answered 17/6, 2024 at 19:14

r filter dplyr

Is it possible to filter in dplyr by the position of a column?

I know how to do it without dplyr

iris[iris[,1]>6,]

But how can I do it in dplyr?

Thanks!

Rankin answered 25/9, 2017 at 3:22 Comment(2)

I really don't know if it is a good way, let alone the best way, but iris %>% filter(select(.,1) > 6) maybe? – Nidanidaros 25/9, 2017 at 3:26

Or iris %>% filter(.[[1]] > 6) – Hardy 25/9, 2017 at 3:36

Besides the suggestion by @thelatemail, you can also use filter_at and pass the column number to vars parameter:

iris %>% filter_at(1, all_vars(. > 6))

all(iris %>% filter_at(1, all_vars(. > 6)) == iris[iris[,1] > 6, ])
# [1] TRUE

Fowler answered 25/9, 2017 at 3:35 Comment(0)

No magic, just use the item column number as per above, rather than the variable (column) name:

library("dplyr")

iris %>%
  filter(iris[,1] > 6)

Which as @eipi10 commented is better as

iris %>%
  filter(.[[1]] > 6)

Reconstruct answered 25/9, 2017 at 3:34 Comment(2)

Probably should be filter(.[,1] > 6). It doesn't matter here, but in general if you've changed the initial data frame with other piped functions before the filter, filter(iris[,1] > 6) will reach outside the pipe to the original data frame, rather than use the piped data frame. – Hardy 25/9, 2017 at 3:41

Just as an example where these two are not comparable - iris %>% mutate(Sepal.Length=0) %>% filter(iris[,1] > 6) vs iris %>% mutate(Sepal.Length=0) %>% filter(.[,1] > 6) – Nidanidaros 25/9, 2017 at 3:45

dply >= 1.0.0

Scoped verbs (_if, _at, _all) and by extension all_vars() and any_vars() have been superseded by across(). In the case of filter the functions if_any and if_all have been created to combine logic across multiple columns to aid in subsetting (these verbs are available in dplyr >= 1.0.4):

if_any() and if_all() are used with to apply the same predicate function to a selection of columns and combine the results into a single logical vector.

The first argument to across, if_any, and if_any is still tidy-select syntax for column selection, which includes selection by column position.

Single Column

In your single column case you could do any with the same result:

iris %>% 
  filter(across(1, ~ . > 6))

iris %>% 
  filter(if_any(1, ~ . > 6))

iris %>% 
  filter(if_all(1, ~ . > 6))

Multiple Columns

If you're apply a predicate function or formula across multiple columns then across might give unexpected results and in this case you should use if_any and if_all:

iris %>% 
  filter(if_all(c(2, 4), ~ . > 2.3)) # by column position

  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1          6.3         3.3          6.0         2.5 virginica
2          7.2         3.6          6.1         2.5 virginica
3          5.8         2.8          5.1         2.4 virginica
4          6.3         3.4          5.6         2.4 virginica
5          6.7         3.1          5.6         2.4 virginica
6          6.7         3.3          5.7         2.5 virginica

Notice this returns rows where all selected columns have a value greater than 2.3, which is a subset of rows where any of the selected columns meet the logic:

iris %>% 
  filter(if_any(ends_with("Width"), ~ . > 2.3)) # same columns selection as above

Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1           5.1         3.5          1.4         0.2    setosa
2           4.9         3.0          1.4         0.2    setosa
3           4.7         3.2          1.3         0.2    setosa
4           4.6         3.1          1.5         0.2    setosa
5           5.0         3.6          1.4         0.2    setosa
6           6.7         3.3          5.7         2.5 virginica
7           6.7         3.0          5.2         2.3 virginica
8           6.3         2.5          5.0         1.9 virginica
9           6.5         3.0          5.2         2.0 virginica
10          6.2         3.4          5.4         2.3 virginica
11          5.9         3.0          5.1         1.8 virginica

The output above was shorted to be more compact for this example.

Falzetta answered 26/3, 2021 at 18:42 Comment(0)

Not very elegant, but you can rename the variable, and use the new name on a dplyr pipe.

iris_copy <- iris
original_names <- names(iris_copy)

Renaming the first variable:

names(iris_copy)[1] <- "col1"

Filtering the first variable:

iris_copy |> filter(col1 > 6)

If you need the original variable name:

names(iris_copy) <- original_names

However answered 17/6, 2024 at 19:14 Comment(0)

dply >= 1.0.0

Recommended topics

Hot tags