How to convert foreach in Stata to R?
Asked Answered
R

2

5

I have a data frame (df) with variables such as CA, VT, NC, AZ, CAvalue, VTvalue, NCvalue, AZvalue etc.

In Stata, I can use the foreach command and generate new variables:

foreach x in CA VT NC AZ {
    gen `x'1 = 0
    replace `x'1 = 1 if `x'value > 1
}

When I convert this code to R , I found it problematic.

Here's what I wrote:

x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1

While I have no problem in creating the new variables ending with "1", I don't know how to convert the line starting with "replace". I tried to create another vector with CAtime, VTtime, NCtime, and AZtime. But I don't know how to incorporate them into the loop without writing it four times.

UPDATE: Originally, my data looks something like this:

df=as.data.frame(matrix(runif(200,1,150),ncol=8,nrow=25))
name=c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df)=name

Then I want to create 4 new variables CA1, VT1, NC1, AZ1,in a new data frame m1:

x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1

All the values of variables in m1=0.

Then, if CAtime>1, I want the corresponding cell in CA1=1. That applies for all the four variables CAtime, VTtime, NCtime, AZtime. I don't want to write four loops and that's why I am stuck.

Resilience answered 10/2, 2015 at 4:52 Comment(6)
typo in second last para: CAvalue, VTvalue, NCvalue, AZvalue, instead of time.Resilience
what is the expected outputTablecloth
i plan to get 4 new variables CA1 VT1 NC1 AZ1. For example, if CAvalue>1, then CA1=1, else, CA1=0. My original dataset has 50 variables like this, so I can't write 50 basic loops with CAvalue>1, VTvalue>1, and so on in each loop.Resilience
I am getting an error with your code, =exp required r(100); but maybe because I am running it on linux.Sigmatism
But maybe that doesn't matter, I would second @Tablecloth and ask you to post what you want the output to be which should be pretty easy to figure out.Sigmatism
I took the liberty of making several syntax corrections and stylistic edits to your Stata code segment, the details of which are secondary to your main question. Also, as it stands, you would put the same values into four different variables; your intent is however clear, so I've edited that too. Anyone interested in the Stata code should note that generate `x'1 = `x'value > 1 is a cleaner replacement for your two commands within the loop.Valentine
D
6

Take an example dataset df, matching your description:

set.seed(1)
x <- c("CA","VT","NC","AZ")
df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),simplify=FALSE)),
      c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue","AZvalue"))
df

#  CA VT NC AZ CAvalue VTvalue NCvalue AZvalue
#1  0  2  0  1       2       1       1       2
#2  1  2  0  2       0       0       1       2
#3  1  1  2  2       1       1       1       0
#4  2  1  1  1       0       2       0       2
#5  0  0  2  2       0       1       2       1

Now lapply a check if value > 1 across each of the columns, and reassign this to new variables with a 1 appended to the end:

df[paste0(x,"1")] <- lapply(df[paste0(x,"value")], function(n) as.numeric(n > 1) )
df

#  CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1  0  2  0  1       2       1       1       2   1   0   0   1
#2  1  2  0  2       0       0       1       2   0   0   0   1
#3  1  1  2  2       1       1       1       0   0   0   0   0
#4  2  1  1  1       0       2       0       2   0   1   0   1
#5  0  0  2  2       0       1       2       1   0   0   1   0
Designation answered 10/2, 2015 at 5:35 Comment(2)
I might also do df[paste0(x,"1")] <- (df[paste0(x, 'value')]>1)+0LQuestionable
@Questionable - sure. I mainly wanted to avoid such trickery when dealing with someone recently coming over from Stata. It might complicate the matter and as.numeric is probably the formal way to do this anyway.Designation
Q
4

Here is a possible option using set from data.table, which would be efficient as this updates by reference.

library(data.table)
setDT(df)[,(x1):= NA]
x2 <- paste0(x, 'value')
indx <- match(x1, names(df))
for(j in seq_along(x2)){
   set(df, i=NULL, j=indx[j], value=as.numeric(df[[x2[j]]]>1))
 }
df
#   CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1:  0  2  0  1       2       1       1       2   1   0   0   1
#2:  1  2  0  2       0       0       1       2   0   0   0   1
#3:  1  1  2  2       1       1       1       0   0   0   0   0
#4:  2  1  1  1       0       2       0       2   0   1   0   1
#5:  0  0  2  2       0       1       2       1   0   0   1   0

Update

Suppose if we need the new columns in another dataset, we could subset the results to form one. Or using a modified example,

 setDT(df1)
 setDT(df2)
 x2 <- paste0(x, 'time')
 for(j in seq_along(x2)){
   set(df2, i=NULL, j=j, value=as.numeric(df1[[x2[j]]] >1))
  }

  head(df2,4)
  #  CA1 VT1 NC1 AZ1
  #1:   0   0   1   1
  #2:   0   1   1   0
  #3:   0   0   0   1
  #4:   1   1   0   0

data

set.seed(1)
x <- c("CA","VT","NC","AZ")
x1 <- paste0(x, 1)

df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),
   simplify=FALSE)),c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue",
"AZvalue"))

set.seed(425)
df1 <- as.data.frame(matrix(rnorm(200,1,150),ncol=8,nrow=25))
name <- c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df1) <- name

df2 <- as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df1)))
colnames(df2) <- x1
Questionable answered 10/2, 2015 at 5:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.