Add an index (or counter) to a dataframe by group in R [duplicate]
Asked Answered
F

1

12

I have a df like

ProjectID Dist
  1        x
  1        y
  2        z
  2        x
  2        h
  3        k
  ....     ....

I want to add a third column such that we have an incrementing counter for each ProjectID:

ProjectID Dist counter
  1        x     1
  1        y     2
  2        z     1
  2        x     2
  2        h     3
  1        k     3
  ....     ....

I've had a look at seq rank and a couple of other bits particularly looking to see if I could use ddply to help:

df$counter <- ddply(df,.(projectID), function(x).....? )

I think I could adapt this answer How to create a counter/numeration by group? but would prefer something using something like ddply (I can't find an equivalent of cumsum but I think that's the same principle here: Create ascending series of integers by group in Pandas ). That'd let me index occurrences in a list (and e.g. merge on this).

Fantoccini answered 21/2, 2015 at 16:15 Comment(5)
You could try ave i.e. df$counter <- with(df, ave(seq_along(ProjectID), ProjectID, FUN=seq_along)) or a compact wrapper would be library(splitstackshape);getanID(df, 'ProjectID')[] or using plyr; ddply(df, .(ProjectID), mutate, counter=seq_along(Dist))Astatine
Ok that works (thank you!) but I don't really understand what it's doing? (my head hurts)Fantoccini
We are grouping by ProjectID and creating a new column as the sequence of Dist per each group. You will find it easy after you read the help pages and try some examplesAstatine
It's the use of ave I (think) I'm finding confusing - I get the ddply example (which also works perfectly, thanks again) but the use of ave alongside seq_along I'm struggling to get my head aroundFantoccini
In the ave, second argument is the grouping variable i.e. ` ave(x, ..., FUN = mean)` If you look at the description ` ...: Grouping variables, typically factors, all of the same ‘length’ as ‘x’.` . You can also use ave(ProjectID, ProjectID, FUN=seq_along), but when you have character/factor columns, this will either result in error or get character elements as output.Astatine
L
16

A dplyr solution is quite simple:

library(dplyr)

df %>% group_by(ProjectID) %>% mutate(counter = row_number(ProjectID))


#  ProjectID Dist counter
#1         1    x       1
#2         1    y       2
#3         2    z       1
#4         2    x       2
#5         2    h       3
#6         1    k       3
Lemmuela answered 21/2, 2015 at 16:20 Comment(3)
mutate(counter=row_number()) should do it.Astatine
This is probably a stupid question...what's %>% do? (And slightly tangential, is there a way to effectively search [google] for that type of code?)Fantoccini
%>% is a pipe or chain operator... it works like this: mydata %>% do_something_with_it %>% do_something_else - it simply enables you to chain together functions.Lemmuela

© 2022 - 2024 — McMap. All rights reserved.