Suppose I have my tree in the form of an adjacency list
It is important (for your understanding) to note that you have a connected graph in this kind of an adjacency list, but I think it was just a typo. I will propose this as an edit, but I just want you to be aware of it.
The fact that it is a graph and not a tree can be seen from those lines:
A 2 B 12 I 25
B 3 C 10 H 40 I 8
Here you have a circle already at A-B-I:
A
12/_\25
B 8 I
Having a circle by definition means it cannot be a tree!
There is one more thing one can see from the way I presented this subgraph:
the edges have weights, not the nodes.
Now let's take a look at your provided example:
A 2 B 12 I 25
B 3 C 10 H 40 I 8
C 2 D 18 G 55
D 1 E 44
E 2 F 60 G 38
F 0
G 1 H 35
H 1 I 35
First of all we can see that this adjacency list is either incorrect or already optimized for your needs. This can (for example) be seen from the lines
E 2 F 60 G 38
F 0
The first line here shows an edge from E to F, yet the second line says F had a degree of 0, but the degree is defined by the edges incident to it!
This is what the adjacency list would look like if it was 'complete':
A 2 B 12 I 25
B 4 A 12 C 10 H 40 I 8
C 3 B 10 D 18 G 55
D 2 C 18 E 44
E 3 D 44 F 60 G 38
F 1 E 60
G 3 C 55 E 38 H 35
H 3 B 40 G 35 I 35
We can see lots of redundancy because every edge occurs twice, this is why your 'optimized' adjacency list better suits your needs - would it be this 'complete' example one would do many useless checks.
However, you should only rely on this if you can be sure that all data given to your code is already 'optimized' (or rather in this format)! I will take this as a given from now on.
Let's talk about your data structure.
You may of course use your proposed array of strings, but to be honest, i would rather recommend something like @AmirAfghani's proposed data structure. Using his approach would make your work easier (as it - as he already pointed out - would be closer to your mental representation) and even more efficient (i guess, don't rely on this guess ;)) as you would be doing many operations on strings otherwise.
In the title you asked after Prim's algorithm, but in your actual question you said either Prim's or Kruskal's. I will go with Kruskal simply because his algorithm is way easier and you seem to accept both.
Kruskal's algorithm
Kruskal's algorithm is fairly simple, it's basically:
- Start with every node, but no edges
Repeat the following as often as possible:
- From all of the unused/unchosen edges choose the one with the fewest weight (if there are more than one: just pick one of them)
- Check if this edge would cause a circle, if it does, mark it as chosen and 'discard' it.
- If it doesn't cause a circle, use it and mark it as used/chosen.
That's all. It's really that simple.
However i would like to mention one thing at this point, i think it best fits here: there is no "the minimum spanning tree" in general, but "a minimum spanning tree" as there can be many, so your results may vary!
Back to your data structure! You may now see why it would be a bad idea to use an array of strings as a data structure. You would repeat many operations on strings (for example checking the weight of every edge) and strings are immutable, that means you cannot simply 'throw away' one of the edges or mark one of the edges in whatever way.
Using a different approach you could just set the weight to -1, or even remove the entry for the edge!
Depth-First Search
From what I have seen I think your main problem is deciding whether an edge would cause a circle or not, right?
To decide this, you will probably have to call a Depth-First Search algorithm and use its return value.
The basic idea is this: start from one of the nodes (i will call this node the root) and try to find a way to the node at the other end of the chosen edge (i will call this node the target node). (of course in the graph which doesn't have this edge yet)
- Choose one unvisited edge incident to your root
- Mark this chosen edge as visited (or something like that, whatever you like)
- repeat the last two steps for the node on the other side of the visited edge
- whenever there are no unvisited edges, go back one edge
- if you are back at your root and have no unvisited edges incident to it remaining, you are done. return false.
- if you at any point visit your target node, you are done. return true.
now, if this method returns true
this edge would cause a circle.