Algorithm/approximation for combined independent set/hamming distance
Asked Answered
C

2

10

Input: Graph G Output: several independent sets, so that the membership of a node to all independent sets is unique. A node therefore has no connections to any node in its own set. Here is an example path.

Since clarification was called for here another rephrasal:

Divide a given graph into sets so that

  1. i can tell a node from all others by its membership in sets e.g. if node i is present only in set A no other node should be present in set A only

    if node j is present in set A and B then no other node should be present in set A and B only. if the membership of nodes is coded by a bit pattern, then these bit patterns have hamming distance at least one

  2. if two nodes are adjacent in the graph, they should not be present in the same set, hence be an independent set

Example: B has no adjacent nodes D=>A, A=>D

Solution:

  1. A B /
  2. / B D

A has bit pattern 10 and no adjacent node in its set. B has bit pattern 11 and no adjacent node, D has 01 therefore all nodes have hamming distance at least 1 an no adjacent nodes => correct

Wrong, because D and A are connected:

  1. A D B
  2. / D B

A has bit pattern 10 and D in its set, they are adjacent. B has bit pattern 11 and no adjacent node, D has 11 as has B, so there are two errors in this solution and therefore it is not accepted.

Of course this should be extended to more Sets as the number of Nodes in the Graph increases, since you need at least log(n) sets.

I already wrote a transformation into MAX-SAT, to use a sat-solver for this. but the number of clauses is just to big. A more direct approach would be nice. So far I have an approximation, but I would like an exact solution or at least a better approximation.

I have tried an approach where I used a particle swarm to optimize from an arbitrary solution towards a better one. However the running time is pretty awful and the results are far from great. I am looking for a dynamic algorithm or something, however i cannot fathom how to divide and conquer this problem.

Cover answered 18/8, 2010 at 15:24 Comment(10)
nope. it is the abstraction of a project i am working on right now. thought i'd get some input from here, since its essentially two np-complete problems rolled into one.Cover
reduced it onto sat, though number of clauses is much too high for a sat-solver. any ideas on how to divide and conquer this?Cover
How big is the input data? Do you have any sample data to test on? What performance do you expect?Quadriplegia
inputdata would be around 100 to 200 nodes. i can generate a testset, but not show real data. My solution needs for a 115 - dataset 20 minutes to 10 hours depending on quality using a particle swarm algorithm, but without guarantee of an optimal solution.Cover
Please review your example. For example the sentence "Wrong, because D and A are connected:" ... and so are A and B, but bidirectionally. Also "A node therefore has no connections to any node in its own set" (??)Shopping
Could you try re-stating the question a little more clearly? I'm not clear on how the comma-grouped lists of connections in your "path" relates to the input graph G. For example, D=>A appears twice... does that mean there are two paths from D to A in the graph?Idzik
You're willing to throw 200 rep for a bounty, so please consider making the investment worth it by providing more clarification, diagrams, anything to help us understand the problem so we can help you quicker.Ulrikaumeko
sorry guys went on a trip for the weekend. will clarify now. i will be back in an hour.Cover
Your question is still unclear. Do you want a method to generate all possible sets of sets that meet these conditions? Or just one? Or the one that optimizes some measure like "these nodes are in many sets, and those nodes are in few"? And what does "weight the connectivities" mean, consdiering that any legal division will cut all edges in the graph?Shotputter
for the moment i just want one possible collection of sets that meet these conditions. Getting them all would be a bonus though. I could employ the optimization with that one then. There can be of course several solutions at once. i take the weighting out for now, since it complicates things too much. It would be a cost function that prefers the presence of having nodes present multiple times instead of others.Cover
S
7

Not a complete answer, and I don't know how useful it will be to you. But here goes:

The hamming distance strikes me as a red herring. Your problem statement says it must be at least 1 but it could be 1000. It suffices to say the bit encoding for each node's set memberships is unique.

Your problem statement doesn't spell it out, but your solution above suggests every node must be a member of at least 1 set. ie. a bit encoding of all 0's is not allowed for any node's set memberships.

Ignoring connected nodes for a moment, disjoint nodes are easy: Simply number them sequentially with an unused bit encoding. Save those for last.

Your example above uses directed edges, but again, that strikes me as a red herring. If A cannot be in the same set as D because A=>D, D cannot be in the same set as A regardless whether D=>A.

You mention needing at least log(N) sets. You will also have at most N sets. A fully connected graph (with (N^2-N)/2 undirected edges) will require N sets each containing a single node.

In fact, if your graph contains a fully connected simplex of M dimensions (M in 1..N-1) with M+1 vertices and (M^2+M)/2 undirected edges, you will require at least M+1 sets.

In your example above, you have one such simplex (M=1) with 2 vertices {A,D} and 1 (undirected) edge {(A,D)}.

It would seem that your problem boils down to finding the largest fully connected simplexes in your graph. Stated differently, you have a routing problem: How many dimensions do you need to route your edges so none cross? It doesn't sound like a very scalable problem.

The first large simplex found is easy. Every vertex node gets a new set with its own bit.

The disjoint nodes are easy. Once the connected nodes are dealt with, simply number the disjoint nodes sequentially skipping any previously used bit patterns. From your example above, since A and D take 01 and 10, the next available bit pattern for B is 11.

The tricky part then becomes how to fold any remaining simplexes as much as possible into the existing range before creating any new sets with new bits. When folding, one must use 2 or more bits (sets) for each node, and the bits (sets) must not intersect with the bits (sets) for any adjacent node.

Consider what happens to your example above when one adds another node, C, to the example:

If C connects directly to both A and D, then the initial problem becomes finding the 2-simplex with 3 vertices {A,C,D} and 3 edges {(A,c),(A,D),(C,D)}. Once A, C and D take the bit patterns 001, 010 and 100, the lowest available bit pattern for disjoint B is 011.

If, on the other hand, C connects directly A or D but not both, the graph has two 1-simplexes. Supposing we find the 1-simplex with vertices {A,D} first giving them the bit patterns 01 and 10, the problem then becomes how to fold C into that range. The only bit pattern with at least 2 bits is 11, but that intersects with whichever node C connects to so we have to create a new set and put C in it. At this point, the solution is similar to the one above.

If C is disjoint, either B or C will get the bit pattern 11 and the remaining one will need a new set and get the bit pattern 100.

Suppose C connects to B but not to A or D. Again, the graph has two 1-simplexes but this time disjoint. Suppose {A,D} is found first as above giving A and D the bit patterns 10 and 01. We can fold B or C into the existing range. The only available bit pattern in the range is 11 and either B or C could get that pattern as neither is adjacent to A or D. Once 11 is used, no bit patterns with 2 or more bits set remain, and we will have to create a new set for the remaining node giving it the bit pattern 100.

Suppose C connects to all 3 A, B and D. In this case, the graph has a 2-simplex with 3 vertexes {A,C,D} and a 1-simplex with 2 vertexes {B, C}. Proceeding as above, after processing the largest simplex, A, C and D will have bit patterns 001, 010, 100. For folding B into this range, the available bit patterns with 2 or more bits set are: 011, 101, 110 and 111. All of these except 101 intersect with C so B would get the bit pattern 101.

The question then becomes: How efficiently can you find the largest fully-connected simplexes?

If finding the largest fully connected simplex is too expensive, one could put an approximate upper bound on potential fully connected simplexes by finding maximal minimums in terms of connections:

  1. Sweep through the edges updating the vertices with a count of the connecting edges.

  2. For each connected node, create an array of Cn counts initially zero where Cn is the count of edges connected to the node n.

  3. Sweep through the edges again, for the connected nodes n1 and n2, increment the count in n1 corresponding to Cn2 and vice versa. If Cn2 > Cn1, update the last count in the n1 array and vice versa.

  4. Sweep through the connected nodes again, calculating an upper bound on the largest simplex each node could be a part of. You could build a pigeon-hole array with a list of vertices for each upper bound as you sweep through the nodes.

  5. Work through the pigeon-hole array from largest to smallest extracting and folding nodes into unique sets.

If your nodes are in a set N and your edges in a set E, the complexity will be: O(|N|+|E|+O(Step 5))

If the above approximation suffices, the question becomes: How efficiently can you fold nodes into existing ranges given the requirements?

Schwann answered 30/8, 2010 at 4:59 Comment(6)
yes that is pretty much what i am looking for. points for helping me rephrase that.Cover
+1. Can you clarify what you mean by simplex? I was reading en.wikipedia.org/wiki/Simplex_graph but I don't think you mean a graph that has a node for every clique in a corresponding graph. Maybe simplex just means "subgraph" in this context?Varien
A simplex is a shape: a polygon, polyhedron or polytope. It is the simplest closed shape comprising straight edges, sides or surfaces for a given number of dimensions. A simplex is always convex. In 2 dimensions, the simplex is a triangle. In 3 dimensions, the simplex is a tetrahedron. In 1 dimension, the simplex is a line segment. For any given number of dimensions n, the n-simplex has n+1 vertices and n+1 "sides". See: en.wikipedia.org/wiki/SimplexSchwann
ok, since the question will end tomorrow and i dont want you missing your bounty, i am closing this. can you specify where you found the answer or further reading on that subject, should i get some more questions during the implementation? For example how do i get an upper bound for the largest simplex a node is part of? Also bullet point 5 i would need some more explanations for.Cover
perhaps i will post a followup question once i get the reps back. thank you for your effort.Cover
I don't know how helpful it will be going forward, but I have done a bunch of work in the past involving delaunay triangulations, voronoi tessellations and k-means, which are all related to each other but not directly to this problem. However, that work made me familiar with simplexes. To be a part of an n-simplex, a node must have n or more connected neighbors that each have n or more connected neighbors. For step 4, start at the end of the array and work back until you find enough connected neighbors. For actual simplexes, the neighbors connect to each other, but not in our approximation.Schwann
P
1

This maybe not the answer you might expect, but I can't find a place to add a comment. So I type it directly here. I can't fully understand your question. Or does it need specific knowledge to understand? What is this independent set? As I know a node in an independent set from a directed graph have a two way path to any other node in this set. Is your notion the same?

If this problem is like what I assume, independent sets can be found by this algorithm: 1. do depth-first search on the directed graph, records the time of tree rooted by this node is traversed. 2. then reverse all the edges in this graph 3. do depth-frist search again on the modified graph. The algorihtm is precisely explained by book "introduction to alogrithm"

Permittivity answered 27/8, 2010 at 10:12 Comment(1)
1. the question is abstract, that means the underlying mechanisms do not really matter. the independent set keeps conflicting components away from each other, however what the project will do is difficult to explain here, and my employer would not like it very much (: 2. for the problem of finding independent sets alone in a graph there are suitable approximations available. the main problem lies in the combination of the two problems, that a node's membership to the sets is unique and still has no connection to other nodes in its setsCover

© 2022 - 2024 — McMap. All rights reserved.