Simply speaking, you can use quick-find.
The key is to use two temporary list.
The first is called elements, which stores all elements existing in all groups.
The second is named labels. I got the inspiration from sklearn's kmeans algorithm. 'labels' stores the labels or the centroids of the elements. Here I simply let the first element in the cluster be the centroid. Initially, the values are from 0 to length-1, ascendingly.
For each group, I get their 'indices' in 'elements'.
I then get the labels for group according to indices.
And I calculate the min of the labels, that will be the new label for them.
I replace all elements with labels in labels for group with the new label.
Or to say, for each iteration,
I try to combine two or more existing groups.
If the group has labels of 0 and 2
I find out the new label 0, that is the min of the two.
I than replace them with 0.
def cluser_combine(groups):
n_groups=len(groups)
#first, we put all elements appeared in 'gruops' into 'elements'.
elements=list(set.union(*[set(g) for g in groups]))
#and sort elements.
elements.sort()
n_elements=len(elements)
#I create a list called clusters, this is the key of this algorithm.
#I was inspired by sklearn kmeans implementation.
#they have an attribute called labels_
#the url is here:
#https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
#i called this algorithm cluster combine, because of this inspiration.
labels=list(range(n_elements))
#for each group, I get their 'indices' in 'elements'
#I then get the labels for indices.
#and i calculate the min of the labels, that will be the new label for them.
#I replace all elements with labels in labels_for_group with the new label.
#or to say, for each iteration,
#i try to combine two or more existing groups.
#if the group has labels of 0 and 2
#i find out the new label 0, that is the min of the two.
#i than replace them with 0.
for i in range(n_groups):
#if there is only zero/one element in the group, skip
if len(groups[i])<=1:
continue
indices=list(map(elements.index, groups[i]))
labels_for_group=list(set([labels[i] for i in indices]))
#if their is only one label, all the elements in group are already have the same label, skip.
if len(labels_for_group)==1:
continue
labels_for_group.sort()
label=labels_for_group[0]
#combine
for k in range(n_elements):
if labels[k] in labels_for_group[1:]:
labels[k]=label
new_groups=[]
for label in set(labels):
new_group = [elements[i] for i, v in enumerate(labels) if v == label]
new_groups.append(new_group)
return new_groups
I printed out the detailed result for your question:
cluser_combine([['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']])
elements:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'k', 'o', 'p']
labels:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
--------------------group 0-------------------------
the group is:
['a', 'b', 'c']
indices for the group in elements
[0, 1, 2]
labels before combination
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
combining...
labels after combination
[0, 0, 0, 3, 4, 5, 6, 7, 8, 9]
--------------------group 1-------------------------
the group is:
['b', 'd', 'e']
indices for the group in elements
[1, 3, 4]
labels before combination
[0, 0, 0, 3, 4, 5, 6, 7, 8, 9]
combining...
labels after combination
[0, 0, 0, 0, 0, 5, 6, 7, 8, 9]
--------------------group 2-------------------------
the group is:
['k']
--------------------group 3-------------------------
the group is:
['o', 'p']
indices for the group in elements
[8, 9]
labels before combination
[0, 0, 0, 0, 0, 5, 6, 7, 8, 9]
combining...
labels after combination
[0, 0, 0, 0, 0, 5, 6, 7, 8, 8]
--------------------group 4-------------------------
the group is:
['e', 'f']
indices for the group in elements
[4, 5]
labels before combination
[0, 0, 0, 0, 0, 5, 6, 7, 8, 8]
combining...
labels after combination
[0, 0, 0, 0, 0, 0, 6, 7, 8, 8]
--------------------group 5-------------------------
the group is:
['p', 'a']
indices for the group in elements
[9, 0]
labels before combination
[0, 0, 0, 0, 0, 0, 6, 7, 8, 8]
combining...
labels after combination
[0, 0, 0, 0, 0, 0, 6, 7, 0, 0]
--------------------group 6-------------------------
the group is:
['d', 'g']
indices for the group in elements
[3, 6]
labels before combination
[0, 0, 0, 0, 0, 0, 6, 7, 0, 0]
combining...
labels after combination
[0, 0, 0, 0, 0, 0, 0, 7, 0, 0]
([0, 0, 0, 0, 0, 0, 0, 7, 0, 0],
[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'o', 'p'], ['k']])
Please refer to my github jupyter notebook for details