Tidyr Separate using regex
Asked Answered
K

2

2

I searched and searched for this and found similar stuff but nothing quite right. Hopefully this hasn't been answered.

Lets say I have a column with Y,N, and sometimes extra information

    df<-data.frame(Names=c("Patient1","patient2","Patient3","Patient4","patient5"),Surgery=c("Y","N","Y-this kind of surgery","See note","Y"))

And I'm trying to separate out the Y or N into one column, and everything else from that column into another.

I've tried

    df%>%separate('Surgery',c("Surgery","Notes"), sep=" ")

Will end up with a column that has "see", next column has "notes"

    df%>%separate('Surgery',c("Surgery","Notes"), sep = '^Y|^N')

Just gets weird

    df%>%separate('Surgery',c("Surgery","Notes), sep= "^[YN]?")

Splits notes correctly, removes Y and N.

Anybody know how to separate it? The result I'm looking for would have only Y or N in the surgery column and anything else pushed to a different column.

Keto answered 22/3, 2018 at 19:31 Comment(0)
R
8

We can use extract from tidyr

library(tidyr)
library(dplyr)
df %>% 
  extract(Surgery, into = c("Surgery", "Notes"), "^([YN]*)[[:punct:]]*(.*)")
#     Names Surgery                Notes
#1 Patient1       Y                     
#2 patient2       N                     
#3 Patient3       Y this kind of surgery
#4 Patient4                     See note
#5 patient5       Y                     
Rhinoplasty answered 22/3, 2018 at 19:41 Comment(1)
That's it! Thank you akrun!Keto
M
0

Same result, but a bit lighter on regex:

df %>%
    tidyr::separate_wider_regex(
        cols = Surgery,
        patterns = c(Surgery = "[YN]*", Notes = "[^YN]*")
    )

# A tibble: 5 × 3
  Names    Surgery Notes                  
  <chr>    <chr>   <chr>                  
1 Patient1 "Y"     ""                     
2 patient2 "N"     ""                     
3 Patient3 "Y"     "-this kind of surgery"
4 Patient4 ""      "See note"             
5 patient5 "Y"     ""                     

You could also play with ^ or $ if notes can contain Y or N characters.

Minicam answered 30/10, 2023 at 16:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.