Get the row number in which the value in a column changes
Asked Answered
S

6

7

I have this dataframe:

dat <- data.frame(Var1 = c(1,1,1,1,2,2,3,3,3,3,3,3,4,4,4,5,5,5,5))

> dat
   Var1
1     1
2     1
3     1
4     1
5     2
6     2
7     3
8     3
9     3
10    3
11    3
12    3
13    4
14    4
15    4
16    5
17    5
18    5
19    5

I want to get the row numbers where a new value appears for the first time.

The result should be:

c(1, 5, 7, 13, 16)

I got as far as identifying the unique values with unique(dat$Var1) and finding the first row for one value with min(which(dat$Var1 == 1)), but I don't know how to combine the two without using a loop.

Ideally I'm looking for a solution in base R, unless the solution in some package is extremely simple (e.g. some.function(dat$Var1)).

Strung answered 5/4 at 12:12 Comment(0)
P
9

If your intent is to find where a new value first appears regardless of the order, then I'll change your data slightly:

dat <- data.frame(Var1 = c(1,2,1,1,2,2,3,3,3,3,3,3,4,4,4,5,5,5,5))
###                          ^ different

With this, we'd need to know the first 1 (row 1) but not the next time the value changes to 1 (row 3).

For that we can use just !duplicated:

which(!duplicated(dat$Var1))
# [1]  1  2  7 13 16
### c(1,2,1,1,2,2,3,3,3,3,3,3,4,4,4,5,5,5,5)
###   ^ ^         ^           ^     ^

For fun, if you wanted to know the last occurrence of each number, we can do

which(!duplicated(dat$Var1, fromLast = TRUE))
# [1]  4  6 12 15 19
### c(1,2,1,1,2,2,3,3,3,3,3,3,4,4,4,5,5,5,5)
###         ^   ^           ^     ^       ^

If you want the 2nd (or nth) for each number, we can switch to using ave and seq_along:

which(ave(dat$Var1, dat$Var1, FUN = seq_along) == 2L)
# [1]  3  5  8 14 17
### c(1,2,1,1,2,2,3,3,3,3,3,3,4,4,4,5,5,5,5)
###       ^   ^     ^           ^     ^

(recognizing that this does not return when a number does not have the nth occurrence).

Polacre answered 5/4 at 12:17 Comment(0)
E
6

If consecutive values are the same, then their diff will be 0. Therefore identifying which values of the diff are not equal to zero gives you your desired result.

which(diff(c(0, dat$Var1)) != 0)
#> [1]  1  5  7 13 16
Elka answered 5/4 at 12:13 Comment(0)
P
3

We could combine cumsum and match as an alternative to which:

x <- cumsum(c(TRUE, diff(dat$Var1) != 0))

match(unique(x), x) 

[1]  1  5  7 13 16
Photoactive answered 5/4 at 12:24 Comment(0)
T
2

Another option...

tapply(seq_along(dat$Var1), dat$Var1, min)

 1  2  3  4  5 
 1  5  7 13 16 

This also gives the first value regardless of the order.

Thistly answered 5/4 at 12:27 Comment(0)
W
1

Another option:

head(cumsum(c(1, rle(dat$Var1)$lengths)),-1)
[1]  1  5  7 13 16
Wrestling answered 5/4 at 12:50 Comment(1)
Thank you for teaching me rle! And a very clever use of cumsum and head!Strung
S
1

Ideally I'm looking for a solution in base R, unless the solution in some package is extremely simple (e.g. some.function(dat$Var1)).

Here's some tidyverse shots regardless of the order:

dat <- data.frame(Var1 = c(1,2,1,1,2,2,3,3,3,3,3,4,4,4,5,5,5,5,3))
#                            ^                                 ^

# First appearance
slice_head(rowid_to_column(dat), by = Var1)$rowid
# [1]  1  2  7 12 15

# Last appearance
slice_tail(rowid_to_column(dat), by = Var1)$rowid
# [1]  4  6 19 14 18

# Nth appearance (without errors if isn't one)
nth <- 4
filter(rowid_to_column(dat), .by = Var1, row_number() == nth)
#   rowid Var1
# 1    10    3
# 2    18    5

Edit: leave the nth output as a table in order to identify which var.

Slide answered 5/4 at 13:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.