Is there an R function to extract only numbers from a comma-separated string with many NA values to create a column with only the numbers?

Asked 11/9, 2019 at 17:21 Answered 12/9, 2019 at 21:0

Solved r string na numeric

I have a dataset that looks like this:

 before = data.frame(diag1 = c(1,NA, 1, NA, NA, 1), diag2 = c(NA, NA, NA, 2, NA, NA), diag3 = c(3, NA, NA, NA, 3, 3), diag4 = c(4, 4, NA, NA, 4, NA))

  diag1 diag2 diag3 diag4
1     1    NA     3     4
2    NA    NA    NA     4
3     1    NA    NA    NA
4    NA     2    NA    NA
5    NA    NA     3     4
6     1    NA     3    NA

I have been trying to find a solution in which the end result is a new column named "diagnoses" that looks like this

  diagnoses
1     1,3,4
2         4
3         1
4         2
5       3,4
6       1,3

This is just a much smaller example of my real problem. In the dataset I am working on there are over 70 columns of diagnoses, with no more than 3 numeric values in each row. I have tried strsplit, separate, unite functions. I still haven't found an elegant solution

I have used apply paste function

dat$diagnoses<- apply( (dat[ , cols]), 1, function(x) paste(na.omit(x),collapse=", ") )

However, it yields a string with many commas.

I tried gsub to substitute the , but I still have not been able to get the results I hoped.

This is the output I have been able to get: "1,,3,4,," ",,,4,," " 1,,,,," ",2,,,," ",,3,4,," "1,,3,,,"

Deliladelilah answered 11/9, 2019 at 17:21 Comment(2)

Not able to reproduce your issue with the code as it is working fine for me. May be you have "" instead of NAs – Langland 11/9, 2019 at 17:25

Thank you! You are right, I had not converted all of the variables into numeric before doing this in my big dataset. – Deliladelilah 11/9, 2019 at 17:47

An option is to loop through the row with apply, remove the NA elements and paste it together

before$new <- apply(before, 1, function(x) toString(x[!is.na(x)]))
before$new
#[1] "1, 3, 4" "4"       "1"       "2"       "3, 4"    "1, 3"

Langland answered 11/9, 2019 at 17:22 Comment(0)

Another possibility could be:

before$rowid <- 1:nrow(before)
aggregate(values ~ rowid, 
          paste0, collapse = ",",
          data.frame(before[5], stack(before[-5])))

  rowid values
1     1  1,3,4
2     2      4
3     3      1
4     4      2
5     5    3,4
6     6    1,3

Houseman answered 11/9, 2019 at 17:26 Comment(4)

@Langland now I see. Anyway, I don't think it is a good idea to use apply() in the case of data.frames, so I will leave it here as an alternative. – Houseman 11/9, 2019 at 17:32

For rowwise, apply is the best (assuming that the columns are all numeric). Doing this in aggregate with a created grouping sequence may not add any efficiency – Langland 11/9, 2019 at 17:33

could you please clarify what this part does exactly?(before[5[, stack(before[-5]))) – Deliladelilah 11/9, 2019 at 18:33

Sure :) data.frame(before[5], stack(before[-5])) is used to create the df on which the aggregation is performed. stack(before[-5]) essentially transforms the data from wide to long format, excluding column 5 which is a newly created rowid. before[5] containing the rowid is then recycled according the length of stack(before[-5]) – Houseman 11/9, 2019 at 18:39

foo = function(..., sep = ","){
    paste(..., sep = sep)
}

gsub(",?NA|NA,?", "", do.call(foo, before))
#[1] "1,3,4" "4"     "1"     "2"     "3,4"   "1,3"

Kirkuk answered 11/9, 2019 at 17:41 Comment(0)

I didn't know toString but borrowing from @akrun and using package purrr :

purrr::pmap_chr(before, ~toString(na.omit(c(...))))
# [1] "1, 3, 4" "4"       "1"       "2"       "3, 4"    "1, 3"

Malynda answered 12/9, 2019 at 21:0 Comment(0)

Recommended topics

Hot tags