How to convert foreach in Stata to R?

Asked 10/2, 2015 at 4:52 Answered 10/2, 2015 at 5:57

I have a data frame (df) with variables such as CA, VT, NC, AZ, CAvalue, VTvalue, NCvalue, AZvalue etc.

In Stata, I can use the foreach command and generate new variables:

foreach x in CA VT NC AZ {
    gen `x'1 = 0
    replace `x'1 = 1 if `x'value > 1
}

When I convert this code to R , I found it problematic.

Here's what I wrote:

x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1

While I have no problem in creating the new variables ending with "1", I don't know how to convert the line starting with "replace". I tried to create another vector with CAtime, VTtime, NCtime, and AZtime. But I don't know how to incorporate them into the loop without writing it four times.

UPDATE: Originally, my data looks something like this:

df=as.data.frame(matrix(runif(200,1,150),ncol=8,nrow=25))
name=c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df)=name

Then I want to create 4 new variables CA1, VT1, NC1, AZ1，in a new data frame m1:

x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1

All the values of variables in m1=0.

Then, if CAtime>1, I want the corresponding cell in CA1=1. That applies for all the four variables CAtime, VTtime, NCtime, AZtime. I don't want to write four loops and that's why I am stuck.

Resilience answered 10/2, 2015 at 4:52 Comment(6)

typo in second last para: CAvalue, VTvalue, NCvalue, AZvalue, instead of time. – Resilience 10/2, 2015 at 4:56

what is the expected output – Tablecloth 10/2, 2015 at 5:0

i plan to get 4 new variables CA1 VT1 NC1 AZ1. For example, if CAvalue>1, then CA1=1, else, CA1=0. My original dataset has 50 variables like this, so I can't write 50 basic loops with CAvalue>1, VTvalue>1, and so on in each loop. – Resilience 10/2, 2015 at 5:3

I am getting an error with your code, =exp required r(100); but maybe because I am running it on linux. – Sigmatism 10/2, 2015 at 5:31

But maybe that doesn't matter, I would second @Tablecloth and ask you to post what you want the output to be which should be pretty easy to figure out. – Sigmatism 10/2, 2015 at 5:33

I took the liberty of making several syntax corrections and stylistic edits to your Stata code segment, the details of which are secondary to your main question. Also, as it stands, you would put the same values into four different variables; your intent is however clear, so I've edited that too. Anyone interested in the Stata code should note that generate `x'1 = `x'value > 1 is a cleaner replacement for your two commands within the loop. – Valentine 10/2, 2015 at 11:44

Take an example dataset df, matching your description:

set.seed(1)
x <- c("CA","VT","NC","AZ")
df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),simplify=FALSE)),
      c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue","AZvalue"))
df

#  CA VT NC AZ CAvalue VTvalue NCvalue AZvalue
#1  0  2  0  1       2       1       1       2
#2  1  2  0  2       0       0       1       2
#3  1  1  2  2       1       1       1       0
#4  2  1  1  1       0       2       0       2
#5  0  0  2  2       0       1       2       1

Now lapply a check if value > 1 across each of the columns, and reassign this to new variables with a 1 appended to the end:

df[paste0(x,"1")] <- lapply(df[paste0(x,"value")], function(n) as.numeric(n > 1) )
df

#  CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1  0  2  0  1       2       1       1       2   1   0   0   1
#2  1  2  0  2       0       0       1       2   0   0   0   1
#3  1  1  2  2       1       1       1       0   0   0   0   0
#4  2  1  1  1       0       2       0       2   0   1   0   1
#5  0  0  2  2       0       1       2       1   0   0   1   0

Designation answered 10/2, 2015 at 5:35 Comment(2)

I might also do df[paste0(x,"1")] <- (df[paste0(x, 'value')]>1)+0L – Questionable 10/2, 2015 at 5:41

@Questionable - sure. I mainly wanted to avoid such trickery when dealing with someone recently coming over from Stata. It might complicate the matter and as.numeric is probably the formal way to do this anyway. – Designation 10/2, 2015 at 5:43

Here is a possible option using set from data.table, which would be efficient as this updates by reference.

library(data.table)
setDT(df)[,(x1):= NA]
x2 <- paste0(x, 'value')
indx <- match(x1, names(df))
for(j in seq_along(x2)){
   set(df, i=NULL, j=indx[j], value=as.numeric(df[[x2[j]]]>1))
 }
df
#   CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1:  0  2  0  1       2       1       1       2   1   0   0   1
#2:  1  2  0  2       0       0       1       2   0   0   0   1
#3:  1  1  2  2       1       1       1       0   0   0   0   0
#4:  2  1  1  1       0       2       0       2   0   1   0   1
#5:  0  0  2  2       0       1       2       1   0   0   1   0

Update

Suppose if we need the new columns in another dataset, we could subset the results to form one. Or using a modified example,

 setDT(df1)
 setDT(df2)
 x2 <- paste0(x, 'time')
 for(j in seq_along(x2)){
   set(df2, i=NULL, j=j, value=as.numeric(df1[[x2[j]]] >1))
  }

  head(df2,4)
  #  CA1 VT1 NC1 AZ1
  #1:   0   0   1   1
  #2:   0   1   1   0
  #3:   0   0   0   1
  #4:   1   1   0   0

data

set.seed(1)
x <- c("CA","VT","NC","AZ")
x1 <- paste0(x, 1)

df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),
   simplify=FALSE)),c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue",
"AZvalue"))

set.seed(425)
df1 <- as.data.frame(matrix(rnorm(200,1,150),ncol=8,nrow=25))
name <- c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df1) <- name

df2 <- as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df1)))
colnames(df2) <- x1

Questionable answered 10/2, 2015 at 5:57 Comment(0)

Update

data

Recommended topics

Hot tags