Looping through variable names in R
Asked Answered
C

3

6

I'm having a looping issue. It should be simple to solve, but "R for Stata Users" (I've coded in Stata for a couple of years), Roger Peng's videos, and Google don't seem to be helping me. Can one of you please explain to me what I'm doing wrong?

I'm trying to write a loop that run through the 'thresholds' dataframe to pull out information from three sets of columns. I can do what I want to do by writing the same segment of code three times, but as the code gets more complicated, this will become quite cumbersome.

Here is a sample of 'thresholds' (see dput output below, added by a friendly reader):

    threshold_1_name      threshold_1_dir threshold_1_value
1   overweight            >                25
2   possible malnutrition <                31
3   Q1                    >                998
4   Q1                    >                998
5   Q1                    >                998
6   Q1                    >                998
    threshold_1_units threshold_2_name threshold_2_dir threshold_2_value threshold_2_units
1   kg/m^2            obese               >             30                kg/m^2
2   cm                <NA>                >             NA                   
3   <NA>              Q3                  >             998                  
4                     Q3                  >             998                  
5                     Q3                  >             998                  
6                     Q3                  >             998  

This code does what I want to do:

newvars1 <- paste(thresholds$varname, thresholds$threshold_1_name, sep = "_")
noval <- is.na(thresholds$threshold_1_value)
newvars1 <- newvars1[!noval]

newvars2 <- paste(thresholds$varname, thresholds$threshold_2_name, sep = "_")
noval <- is.na(thresholds$threshold_2_value)
newvars2 <- newvars2[!noval]

newvars3 <- paste(thresholds$varname, thresholds$threshold_3_name, sep = "_")
noval <- is.na(thresholds$threshold_3_value)
newvars3 <- newvars3[!noval]

And here is how I am trying to loop:

variables <- NULL
for (i in 1:3) {
  valuevar <- paste("threshold", i, "value", sep = "_")
  namevar <- paste("threshold", i, "name", sep = "_")
  newvar <- paste("varnames", i, sep = "")
  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check == FALSE) {
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }
  variables <- c(variables, newvars)
}

And here is the error I am receiving:

Error: unexpected '}' in "}"

I think something about the way I am calling the 'i' is messing things up, but I'm not sure how to do it correctly. My Stata habits using locals are really biting me in the butt as I switch to R.

EDIT to add dput output, by a friendly reader:

thresholds <- structure(list(varname = structure(1:6, .Label = c("varA", "varB", 
"varC", "varD", "varE", "varF"), class = "factor"), threshold_1_name = c("overweight", 
"possible malnutrition", "Q1", "Q1", "Q1", "Q1"), threshold_1_dir = c(">", 
"<", ">", ">", ">", ">"), threshold_1_value = c(25L, 31L, 998L, 
998L, 998L, 998L), threshold_1_units = c("kg/m^2", "cm", NA, 
NA, NA, NA), threshold_2_name = c("obese", "<NA>", "Q3", "Q3", 
"Q3", "Q3"), threshold_2_dir = c(">", ">", ">", ">", ">", ">"
), threshold_2_value = c(30L, NA, 998L, 998L, 998L, 998L), threshold_2_units = c("kg/m^2", 
"cm", NA, NA, NA, NA)), .Names = c("varname", "threshold_1_name", 
"threshold_1_dir", "threshold_1_value", "threshold_1_units", 
"threshold_2_name", "threshold_2_dir", "threshold_2_value", "threshold_2_units"
), row.names = c(NA, -6L), class = "data.frame")
Chapeau answered 21/12, 2012 at 21:16 Comment(4)
The immediate error is that you are missing an end-paren on the line for (j in 1:length(thresholds$varname) {.Hogweed
@BlueMagister I don't see that. Line 11 of his code contains the closer for that.Drunkard
@BrandonBertelsen Line 11 closes the curly brace, but there is no closing parenthesis for the for statement.Hogweed
Can you provide a sample of the data frame you are using? Something like copy-pasting dput(head(thresholds))? See here for making a good reproducible example.Hogweed
D
6

The first problem I see is in if(check = "FALSE") that's an assignment = if you're testing a condition it needs to be ==. Also, quoting the word "FALSE" means you're testing a variable for the string value (literally the word FALSE), not the logical value, which is FALSE without the quotations.

The second problem has been rightly pointed out by @BlueMagister, you're missing ) at the end of for (j in 1:length(...)) {

See # bad!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check = "FALSE") { # bad!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }

See # good!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check == FALSE) { # good!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }

But because it's an if statement you can use really simple logic, especially on logicals (TRUE / FALSE values).

See # better!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (!check) { # better!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }
Drunkard answered 21/12, 2012 at 21:33 Comment(7)
Since check is logical, "FALSE" is converted to logical. FALSE=="FALSE" #TRUEKellam
Thank you for explaining the problem. This explanation is very clear and helpful!Chapeau
@Struggling_with_R: if this answered your question, please consider clicking the check mark to signal to other users that this is the best answer.Kellam
@JoshuaUlrich: I think it's the reverse, since "FALSE" is character, the logical check is converted to character, and the logical FALSE then becomes "FALSE"Albrecht
@Aaron: correct, "If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw."Kellam
Where is that quote from @JoshuaUlrich?Drunkard
@BrandonBertelsen: ?"==", of course. :) Third-to-last paragraph of the Details section.Kellam
P
1

There is obviously a missing bracket in you for loop. You should consider to use an editor that supports brace matching to avoid those kind of errors.

Phantasy answered 21/12, 2012 at 21:24 Comment(0)
A
0

I think the easiest thing to do would be to just write a function that does what your desired non-looping code does. For reference, here's the output from that code, using the dput output from the edit to your question.

> newvars1 <- paste(thresholds$varname, thresholds$threshold_1_name, sep = "_")
> newvars1 <- newvars1[!is.na(thresholds$threshold_1_value)]
> newvars2 <- paste(thresholds$varname, thresholds$threshold_2_name, sep = "_") 
> newvars2 <- newvars2[!is.na(thresholds$threshold_2_value)]
> c(newvars1, newvars2)
 [1] "varA_overweight"            "varB_possible malnutrition"
 [3] "varC_Q1"                    "varD_Q1"                   
 [5] "varE_Q1"                    "varF_Q1"                   
 [7] "varA_obese"                 "varC_Q3"                   
 [9] "varD_Q3"                    "varE_Q3"                   
[11] "varF_Q3"  

Here's what that function would look like:

unlist(lapply(1:2, function(k) {
  newvars <- paste(thresholds$varname, 
                   thresholds[[paste("threshold", k, "name", sep="_")]], sep = "_")
  newvars <- newvars[!is.na(thresholds[[paste("threshold", k, "value", sep="_")]])]
}))
# [1] "varA_overweight"            "varB_possible malnutrition"
# [3] "varC_Q1"                    "varD_Q1"                   
# [5] "varE_Q1"                    "varF_Q1"                   
# [7] "varA_obese"                 "varC_Q3"                   
# [9] "varD_Q3"                    "varE_Q3"                   
#[11] "varF_Q3"  

I tried to figure out what was going on in your loop but there was a lot in there that didn't make sense to me; here's how I'd write it if I was going to loop in that way.

variables <- NULL
for (i in 1:2) {
  valuevar <- paste("threshold", i, "value", sep = "_")
  namevar <- paste("threshold", i, "name", sep = "_")
  newvars <- c()
  for (j in 1:nrow(thresholds)) { 
    if (!is.na(thresholds[[valuevar]][j])) {
      newvars <- c(newvars, paste(thresholds$varname[j], 
                                  thresholds[[namevar]][j], sep = "_"))
    }
  }
  variables <- c(variables, newvars)
}
variables
Albrecht answered 21/12, 2012 at 21:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.