Hopefully this can be done with python! I used two clustering programs on the same data and now have a cluster file from both. I reformatted the files so that they look like this:
Cluster 0:
Brucellaceae(10)
Brucella(10)
abortus(1)
canis(1)
ceti(1)
inopinata(1)
melitensis(1)
microti(1)
neotomae(1)
ovis(1)
pinnipedialis(1)
suis(1)
Cluster 1:
Streptomycetaceae(28)
Streptomyces(28)
achromogenes(1)
albaduncus(1)
anthocyanicus(1)
etc.
These files contain bacterial species info. So I have the cluster number (Cluster 0), then right below it 'family' (Brucellaceae) and the number of bacteria in that family (10). Under that is the genera found in that family (name followed by number, Brucella(10)) and finally the species in each genera (abortus(1), etc.).
My question: I have 2 files formatted in this way and want to write a program that will look for differences between the two. The only problem is that the two programs cluster in different ways, so two cluster may be the same, even if the actual "Cluster Number" is different (so the contents of Cluster 1 in one file might match Cluster 43 in the other file, the only different being the actual cluster number). So I need something to ignore the cluster number and focus on the cluster contents.
Is there any way I could compare these 2 files to examine the differences? Is it even possible? Any ideas would be greatly appreciated!
diff
still might be the easiest solution. – Platinumblonddiff
but I'm worried it won't have the flexibility I need for ignoring the cluster numbers – LiskPython
is indeed a good choice) to extract values into instance fields or simply parse the content into built-in data types likedict
and let the toolsdict
offers to do the job for you. – Southwestwardlydict
would be able to help me with that? – Liskdict
may or may not be able to. But the idea here is that if you're able to parse the data and represent them asdict
, your first step is solid. Whetherdict
would satisfy your needs or not, really depends on the actual use cases you have. From the ones you gave, though vague, I'm confident thatdict
is capable to do it for you. Btw, @limelights I can't help but to +1 for your definition of classes. – Southwestwardlydiff
– Saleable