DIvisive ANAlysis (DIANA) Hierarchical Clustering
Asked Answered
C

1

12

(This post is continuation of my previous question on divisive hierarchical clustering algorithm.)

The problem is how to implement this algorithm in Python (or any other language).

Algorithm description

A divisive clustering proceeds by a series of successive splits. At step 0 all objects are together in a single cluster. At each step a cluster is divided, until at step n - 1 all data objects are apart (forming n clusters, each with a single object).

Each step divides a cluster, let us call it R into two clusters A and B. Initially, A equals R and B is empty. In a first stage, we have to move one object from A to B. For each object i of A, we compute the average dissimilarity to all other objects of A:

enter image description here

The object i' for which equation above attains its maximal value will be moved, so we put

enter image description here

In the next stages, we look for other points to move from A to B. As long as A still contains more than one object, we compute

enter image description here

for each object i of A and we consider the object i'' that maximizes this quantity. When the maximal value of the equation above is strictly positive, we move i'' from A to B and then look in the new A for another object that might be moved. On the other hand, when the maximal value of the difference is negative or 0 we stop the process and the division of R into A and B is completed.

At each step of the divisive algorithm we also have to decide which cluster to split. For this purpose we compute the diameter

enter image description here

for each cluster Q that is available after the previous step, and choose the cluster for which diameter is largest.

My starting code is pasted below. Actually the script returns ValueError: list.remove(x): x not in list.

# Dissimilarity matrix
dm = [
[ 0, 2, 6, 10, 9],
[ 2, 0, 5,  9, 8],
[ 6, 5, 0,  4, 5],
[10, 9, 4,  0, 3],
[ 9, 8, 5,  3 ,0]]

# Create splinter and remaining group
splinter = []
remaining = range(len(dm))

# Find record ra that has the greatest average distance from the rest of the records
ra = 0
dMax = -999
for i in range(len(dm)):
  dSum = 0
  for j in range(len(dm)):
    if i == j:
      continue
    dSum += dm[i][j]
  if dMax < dSum:
    dMax = dSum
    ra = i

splinter.append(ra)
remaining.remove(ra)

# Check every record in remaining  and moves the record if record is closer to splinter
bChanged = True
while bChanged:
  bChanged = False
  for i in range(len(remaining)):
    d1 = 0.0
    for j in range(len(splinter)):
      d1 += dm[i][j]
    d1 /= float(len(splinter))
    d2 = 0.0
    for k in range(len(remaining)):
      if i == k:
        continue
      d2 += dm[i][k]
    if len(remaining) > 1:
      d2 /= (len(remaining) - 1.0)
    if d1 < d2:
      bChanged = True
      splinter.append(i)
      remaining.remove(i)
      break
Cubit answered 26/8, 2015 at 14:12 Comment(5)
So in other words, you have not yet started. Then better start writing the actual algorithm...Insolent
Maybe I don't know where to start...Cubit
Try starting with the first step of the algorithm. Starting at the end is harder.Insolent
I updated the code. Maybe now somebody has motivation to look at it.Cubit
why so complicated? put each object into an own cluster and you are done. simple loop, isn't it?Mussolini
K
0

You have used

  1. remaining.remove(ra)
  2. remaining.remove(i)

where both ra and i are indexes, and list.remove() expects an item to delete, not the index. You should use del remaining[ra] instead.

Kauslick answered 25/3, 2022 at 16:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.