I am trying to count the number of unique drugs in this list.
my_drugs=c('a', 'b', 'd', 'h', 'q')
I have the following dictionary,which gives me drug synonyms, but it is not set up so that the definitions are only for unique drugs:
dictionary <- read.table(header=TRUE, text="
drug names
a b;c;d;x
x b;c;q
r h;g;f
l m;n
")
So in this case, there are 2 unique drugs in the list (because a, either directly or indirectly, has synonyms b,d,q). Synonyms of synonyms count as synonyms.
My attempted approach was to first make a dictionary that only had unique drugs on the left side. To do this, I would cycle through the dictionary$drug, grep in dictionary$drug and dictionary$synonyms, take the union of those and replace drug$synonyms, and then delete the other rows from the dictionary.
bigdf=dictionary
small_df=data.frame("drug"=NA,"names"=NA)
for(i in 1:nrow(bigdf)){
search_term=sprintf("*%s*",bigdf$drug[i])
index=grep(search_term,bigdf$names)
list=bigdf$names[index]
list=Reduce(union,list)
list=paste(list, collapse=";")
if(!list==""){
new_row=data.frame("drug"=bigdf$drug[index][1],"names"=list)
small_df=rbind(small_df,new_row)
#small_df
bigdf=bigdf[-index,]
#dim(bigdf)
}
else{
new_row=data.frame("drug"=bigdf$drug[index][1],"names"="alreadycounted")
small_df=rbind(small_df,new_row)
}
}
This did not work (some drugs were missing from small_df), and even if it had I'm not sure how I would have used my new dictionary to count the number of unique drugs in my list.
How can I count the number of unique drugs in my_drugs?
Thank you for your help, and let me know if this needs further clarification.
Data Set Size: 200 elements in my_drugs, 2000 rows in dictionary, each drug has 10-12 synonyms.