I have a ID variable that comes from 35 different hospitals, so has varying different arrangements of the variable, and sometimes it has the same root ID number with a secondary line number - e.g. -1, /a, _1 etc.
I want to remove the punctuation, and whatever comes after that punctuation, leaving just the root ID number.
I have currently managed to write out individual lines of code for each different iteration, but I was wondering if there was a more elegant way so that next year when the data comes in I don't need to check for different arrangements?
On someone else's question I managed to find a way to remove the brackets and all the text within the brackets, but I can't seem to figure out how to manipulate it for my purposes
df$patid<- gsub("\\s*\\([^\\)]+\\)","",df$patid)
I tried these two codes without success
df$patid<- gsub("\\[:punct:]s*$","", df$patid)
df$patid<- gsub("\\[:alnum:]s*$","", df$patid)
I also tried the clean
function, which removed all the punctuation, but kept the numbers/characters after them, so that wasn't it.
example of my current code (not all possible iterations) - These do work
df$patid<- gsub("\\-1$", "", df$patid)
df$patid<- gsub("\\-2$", "", df$patid)
df$patid<- gsub("\\-3$", "", df$patid)
df$patid<- gsub("\\-a$", "", df$patid)
df$patid<- gsub("\\-A$", "", df$patid)
df$patid<- gsub("\\-b$", "", df$patid)
df$patid<- gsub("\\-B$", "", df$patid)
df$patid<- gsub("\\b", "", df$patid)
df$patid<- gsub("\\/dd", "", df$patid)
Am not tied to gsub
, am open to different methods.
Example of ID numbers
patid<- c("MB-13-169454", "MB-13-179455", "MB-13-212235.1", "MB-13-212235.2", "MB-13-224683", "570548260-2", "570548260-3", "1458629P-2", "1139093D-2", "8253015N/2", "8253015N/3", "M255858/1", "M255858/2", "8494392Q/2", "9296741B/2", "04152341421/A", "04152341421/B", "04152640475/B", "04152821164/A", "G140381883_1", "G140381883_2", "G140880774_1", "G140880774_2")
Apologies if this has been answered somewhere already
-
+ numbers at the end of the string. Was it? I seeMB-13-169454
in your input, and Tim's solution returns this "code" unchanged, is it expected? – PrussiateMB-13-169454
as an example of an ID that I didn't want manipulated, as this would be a root patient ID, rather than a duplicate – Enigmatic8253015N/21
, you want to keep it as is or return8253015N
? – Prussiate8253015N
– EnigmaticMB-13-169454
from8253015N/21
? A number of alphanumerics at the end of the string? – Prussiate