What is the equivalent of Stata function inlist() in R?
Asked Answered
V

1

9

Stata's inlist allows us to refer to the real or string values of a variable. I was wondering whether R has such a function.

Examples:

I want to choose eight states from the variable state (you can think this as column state in any dataframe where state takes 50 string values (states of United States)).

    inlist(state,"NC","AZ","TX","NY","MA","CA","NJ")

I want to choose nine values of age from the variable age (you can think this as column age in any dataframe where age takes numerical values from 0 to 90).

    inlist(age,16, 24, 45, 54, 67,74, 78, 79, 85) 

Question:

age<-c(0:10) # for this problem age takes values from 0 to 10 only
data<-as.data.frame(age) # age is a variable of data frame data
data$m<-ifelse(c(1,7,9)%in%data$age,0,1) # generate a variable m which takes  value 0 if age is 1, 7, and 8 and 1, otherwise
Expected output: 
   age m
1    0 1
2    1 0
3    2 1
4    3 1
5    4 1
6    5 1
7    6 1
8    7 0
9    8 1
10   9 0
11  10 1
Voussoir answered 12/1, 2013 at 16:15 Comment(8)
I believe you might be looking for match() or %in% but am not too familiar with the inlist function from Stata.Berglund
it would help if you defined state and age and showed the expected output ...Bobbie
@Ananda and @ Ben: Sorry for not being more explicit. I have now edited the question and I hope that it is more clear.Voussoir
stata.com/help.cgi?inlist() is a more concise and direct source of information. In Stata inlist() is a function, and not a command.Spotlight
https://mcmap.net/q/1317435/-equivalent-function-of-r-39-s-quot-in-quot-for-stata/1317221Cirrhosis
question is still not quite clear enough. Could you please give an explicit example along with the expected output?Bobbie
@BenBolker I have edited the question along with the expected output.Voussoir
Edited to refer to inlist() as a Stata function.Spotlight
B
8

I think you want %in%:

statevec <- c("NC","AZ","TX","NY","MA","CA","NJ")
state <- c("AZ","VT")
state %in% statevec ## TRUE FALSE
agevec <- c(16, 24, 45, 54, 67,74, 78, 79, 85) 
age <- c(34,45)
age %in% agevec ## FALSE TRUE

edit: working on updated question.

Copying from @NickCox's link:

inlist(z,a,b,...)
      Domain:       all reals or all strings
      Range:        0 or 1
      Description:  returns 1 if z is a member of the remaining arguments;
                        otherwise, returns 0.  All arguments must be reals
                        or all must be strings.  The number of arguments is
                        between 2 and 255 for reals and between 2 and 10 for
                        strings.

However, I'm not quite sure how this matches up with the original question. I don't know Stata well enough to know if z can be a vector or not: it doesn't sound that way, in which case the original question (considering z=state as a vector) doesn't make sense. If we consider that it can be a vector then the answer would be as.numeric(state %in% statevec) -- I think.

Edit: Update by Ananda

Using your updated data, here's one approach, again using %in%:

data <- data.frame(age=0:10)
within(data, {
    m <- as.numeric(!age %in% c(1, 7, 9))
})
   age m
1    0 1
2    1 0
3    2 1
4    3 1
5    4 1
6    5 1
7    6 1
8    7 0
9    8 1
10   9 0
11  10 1

This matches your expected output, by using ! (NOT) to invert the sense of %in%. It seems to be a little backwards from the way I would think about it (normally, 0=FALSE="is not in the list" and 1=TRUE="is in the list") and my reading of Stata's definition, but if it's what you want ...

Or one can use ifelse for more potential flexibility (i.e. values other than 0/1): substitute within(data, { m <- ifelse(age %in% c(1, 7, 9),0,1)}) in the code above.

Bobbie answered 12/1, 2013 at 16:30 Comment(6)
@Ananda: I have updated the question. Can you please check that?Voussoir
@BenBolker, Sorry about the messy edits! Couldn't keep track of all the edits to the question! ;)Berglund
@Ben, this is a scalar function, and for a good reason, probably: I am not sure how to interpret the many-to-many matches. Should inlist( c(1,7,9),1) evaluate to TRUE? Should inlist( c(1,7,9), c(9,7,1) ) evaluate to TRUE? Should only inlist( c(1,7,9), c(1,7,9), c(2,3,5) ) evaluate to true? When inlist() is encountered in the variable context (recall that Stata only works with one rectangular object called data), it is evaluated for every observation in the data set.Medellin
well, R uses sensible definitions for its %in% operator (if maybe not the ones you want, and maybe not exactly equivalent to inline): c(1,7,9) %in% 1 gives TRUE FALSE FALSE; c(1,7,9) %in% c(9,7,1) gives TRUE TRUE TRUE (all three elements in the first operand match elements of the second operand). I don't know about the version with >2 arguments (%in% only allows two); I would probably make the R definition as a %in% union(b,c,d,...)Bobbie
@AnandaMahto: As far as I understand, as.numeric generates 0 or 1. But, ifelse allows other values too, e.g., 10 or 50. I would like to stick with ifelse: within(data, { m <- ifelse(age %in% c(1, 7, 9),0,1) }) Thanks for the solution.Voussoir
@BenBolker I think it would be more R like to return a logical vector - I suspect stata doesn't have the equivalent and so has to use 0 and 1.Krum

© 2022 - 2024 — McMap. All rights reserved.