Greedy algorithm: Interval coloring
Asked Answered
A

3

7

In interval scheduling, the algorithm is to pick the earliest finish time. But in interval colouring the former does not work. Is there an example or explanation on why picking earliest finish time won't work for interval colouring?

The interval colouring problem is: 
given
 a 
set 
of 
intervals,
 we 
want 
to 
colour all
 intervals
 so 
that 
intervals
 given
 the
 same
 colour
 do 
not 
intersect
 and 
the
goal
 is 
to 
try 
to
 minimize 
the 
number 
of 
colours 
used. This can be thought of as the interval partitioning problem (if it makes more sense)

The interval scheduling problem that i'm referring to is: If you go to a theme park and there are many shows, the start and finish time of each show is an interval, and you are the resource. You want to attend as many shows as possible.

Ailing answered 16/2, 2016 at 0:15 Comment(2)
Could you provide precise definitions for the names you're using? Interval scheduling is a class of problems, and googling "interval coloring" doesn't turn up a problem by that name.Prolactin
@Prolactin I edited the post so that the questions that I'm referring to is more clear.Ailing
O
4

If you only need a counter example of greedy algorithm on coloring, @btilly provides one already.

I am trying to give reasons to make it more intuitive.

First, for scheduling problem, you can indeed prove greedy algorithm works. The idea is like this:

I CANNOT get better result if I'm NOT choosing the show having the earliest finish time, let's see.

If there is two intervals A, B, with A has earlier finish time, then B is either

  1. Start time later than A's finish time, then no conflict at all, why not both?
  2. Start time earlier than A's finish time, there is conflict, I can only choose A OR B, however, A ends earlier, it gives a higher chance to pick more shows afterwards, no?

For coloring problem, however, it is totally another category of problem.

You are forced to pick ALL intervals, while the answer of the problem is THE MAXIMUM # of CONFLICTED INTERVALS OF ALL TIME.

Try to think like this: For all time, there are MAXIMUM 5 exams happening at the same time, you have AT LEAST to use 5 classrooms (colors), right?

So we cannot find this with choosing earliest finish time, the time did not tell you anything.

It may help you to decide whether you PICK or not PICK this interval (like in scheduling problem), but cannot tell you the MINIMUM # of resources you need. They are just different category of problem.

EDITED:

After re-reading OP's question, here's more details as far as I know about the coloring problem.

Define depth be the maximum # of conflicts at all time. Logically we know depth is the lower bound, but we have to proof it is the upper bound as well (by contradiction).

Proof

The proof needs to SORT INTERVALS BY START TIME IN ASCENDING ORDER or FINISH TIME IN DESCENDING ORDER, as shown follow:

Assume the depth of the interval set is d, and the answer is greater than d. Let x be the first interval we process that is using resources d+1, as the processing order is sort by start time ascending, it means there is at least d intervals that start before x and has conflict with x, then the depth of the set is at least d+1, contradiction. So d = depth is also the upper bound of the answer, it is the optimal answer of interval coloring.

Note that if you sort by start time descending, or finish time ascending, then you cannot use the same reasoning.

Concepts / Goal

Now we know depth is the answer, we have to find it. Concept-wise, it DOES NOT matter if you find by using start time or finish time, ascending or descending, all options can give you the depth of the interval set.

Implementation Consideration

However, for implementation, if you have to find it in O(n lg n), you will have to make use of GREEDY METHOD + some DATA STRUCTURES, which probably need the intervals have some kind of ordering. But that's another story, it's for implementation, concept-wise, it does not matter, you only want to find the depth of the interval set.

TL;DR

For interval scheduling problem, the greedy method indeed itself is already the optimal strategy; while for interval coloring problem, greedy method only help to proof depth is the answer, and can be used in the implementation to find the depth (but not in the way as shown in @btilly's counter example)

Octastyle answered 16/2, 2016 at 1:21 Comment(4)
can you clarify your answer a bit? I understand the concept behind interval scheduling and why picking the earliest finish time is good. The thing that I'm struggling with is that, since earliest finish time grants the most number of intervals. Could we not use that to find the minimum number of colours?Ailing
Could you clarify how would you use finish time to find the min # of colors? Is it as @Maymaya show, you would pick the non-conflicted set using earliest finish time, all elements in this set use color A, then find next earliest non-conflicted set, all within it use color B, etc?Octastyle
It is as you described, I was thinking that since earliest finish time allow us to find the most number of non-conflicting set, we can then repeat this procedure which should lead to a lower number of required colours. At least this is the intuition behind itAiling
I think the point is, the way you pick non-conflicted intervals by earliest finish time into one set, cannot ensure the minimum # of total set. See @btilly's example. If you claim greedy strategy works for a problem, you can usually explain in plain English: why other strategy CANNOT be better than greedy (see interval scheduling proof). If you cannot think a reason behind, then maybe greedy is not the optimal strategy, then try to find a counter example. That's my experience...Octastyle
M
8

This is just a case of playing around with pictures until you find an example. The first picture I drew that showed the problem had the following partitioning:

A: (0, 2) (3, 7)
B: (1, 4) (5, 6)

As a picture that looks like this:

-- ----
 --- -

But looking for the earliest stop time rule produces the following coloring:

A: (0, 2) (5, 6)
B: (1, 4)
C: (3, 7)

Which is this partitioning:

--   -
 ---
   ----

So this greedy rule fails to be optimal on this example.

Maymaya answered 16/2, 2016 at 0:51 Comment(2)
Thank you for the counter example! But is there any intuition behind it? I was thinking since earliest finish time grants the most intervals, we can use it grab groups of intervals such that the minimum number of color occurs.Ailing
@JamesonYu Your algorithm went wrong because it prioritized short intervals, and inserted an unnecessarily large gap.Maymaya
O
4

If you only need a counter example of greedy algorithm on coloring, @btilly provides one already.

I am trying to give reasons to make it more intuitive.

First, for scheduling problem, you can indeed prove greedy algorithm works. The idea is like this:

I CANNOT get better result if I'm NOT choosing the show having the earliest finish time, let's see.

If there is two intervals A, B, with A has earlier finish time, then B is either

  1. Start time later than A's finish time, then no conflict at all, why not both?
  2. Start time earlier than A's finish time, there is conflict, I can only choose A OR B, however, A ends earlier, it gives a higher chance to pick more shows afterwards, no?

For coloring problem, however, it is totally another category of problem.

You are forced to pick ALL intervals, while the answer of the problem is THE MAXIMUM # of CONFLICTED INTERVALS OF ALL TIME.

Try to think like this: For all time, there are MAXIMUM 5 exams happening at the same time, you have AT LEAST to use 5 classrooms (colors), right?

So we cannot find this with choosing earliest finish time, the time did not tell you anything.

It may help you to decide whether you PICK or not PICK this interval (like in scheduling problem), but cannot tell you the MINIMUM # of resources you need. They are just different category of problem.

EDITED:

After re-reading OP's question, here's more details as far as I know about the coloring problem.

Define depth be the maximum # of conflicts at all time. Logically we know depth is the lower bound, but we have to proof it is the upper bound as well (by contradiction).

Proof

The proof needs to SORT INTERVALS BY START TIME IN ASCENDING ORDER or FINISH TIME IN DESCENDING ORDER, as shown follow:

Assume the depth of the interval set is d, and the answer is greater than d. Let x be the first interval we process that is using resources d+1, as the processing order is sort by start time ascending, it means there is at least d intervals that start before x and has conflict with x, then the depth of the set is at least d+1, contradiction. So d = depth is also the upper bound of the answer, it is the optimal answer of interval coloring.

Note that if you sort by start time descending, or finish time ascending, then you cannot use the same reasoning.

Concepts / Goal

Now we know depth is the answer, we have to find it. Concept-wise, it DOES NOT matter if you find by using start time or finish time, ascending or descending, all options can give you the depth of the interval set.

Implementation Consideration

However, for implementation, if you have to find it in O(n lg n), you will have to make use of GREEDY METHOD + some DATA STRUCTURES, which probably need the intervals have some kind of ordering. But that's another story, it's for implementation, concept-wise, it does not matter, you only want to find the depth of the interval set.

TL;DR

For interval scheduling problem, the greedy method indeed itself is already the optimal strategy; while for interval coloring problem, greedy method only help to proof depth is the answer, and can be used in the implementation to find the depth (but not in the way as shown in @btilly's counter example)

Octastyle answered 16/2, 2016 at 1:21 Comment(4)
can you clarify your answer a bit? I understand the concept behind interval scheduling and why picking the earliest finish time is good. The thing that I'm struggling with is that, since earliest finish time grants the most number of intervals. Could we not use that to find the minimum number of colours?Ailing
Could you clarify how would you use finish time to find the min # of colors? Is it as @Maymaya show, you would pick the non-conflicted set using earliest finish time, all elements in this set use color A, then find next earliest non-conflicted set, all within it use color B, etc?Octastyle
It is as you described, I was thinking that since earliest finish time allow us to find the most number of non-conflicting set, we can then repeat this procedure which should lead to a lower number of required colours. At least this is the intuition behind itAiling
I think the point is, the way you pick non-conflicted intervals by earliest finish time into one set, cannot ensure the minimum # of total set. See @btilly's example. If you claim greedy strategy works for a problem, you can usually explain in plain English: why other strategy CANNOT be better than greedy (see interval scheduling proof). If you cannot think a reason behind, then maybe greedy is not the optimal strategy, then try to find a counter example. That's my experience...Octastyle
L
2

Actually, the version of the algorithm you may see sorting the inputs with starting time may not work for the finish time. The key point here is that in the algorithm, when more than one color is available, the rule is to assign arbitrarily. The counter example above illustrates this point.

If you want to use the order of finish time instead, one change needs to be made to the algorithm. Let's assume for every existing color i, among all the interval colored by i, the largest finish time is F_i. Then when more than one color is available, the rule is to color the interval with the available color that has the largest F_i.

If you make this change, the algorithm works for order in finish time. I run simulation and also write done formal proof for this.

The intuition is simple, since the sorting is by finish time, the current interval you are assigning, you don't want it to take to much resource. Although there may be two color available, you want to use the one that is less "expensive", to save space for the next interval which may actually starts before the current one and ends after current one.

Let's use the example above, A: (0, 2) (5, 6) B: (1, 4) C: (3, 7) This does not work because when assigning (5,6), A is chosen, however, if following my version of the algorithm, we know the ending on A is 2, ending on B is 4, so you choose B, with the largest ending. This saves the spot for (3,7) which starts early than (5,6) but ends later.

We do not have this kind problem when ordering with starting time, because since the order is by starting time, you know after the current job, no further job will start earlier, so you don't need to save that resource.

Note, although this may be a way to make the order of finish time work, the running time is larger than the original algorithm with order of start time. since you will have to run through all color to find the one not only available but has largest ending.

Loganiaceous answered 12/1, 2017 at 13:2 Comment(2)
While this might be a valuable hint to solve the problem, a good answer also demonstrates the solution. Please EDIT to provide example code to show what you mean. Alternatively, consider writing this as a comment insteadNocturn
For those who are also not content with the accepted answer like me, look no further since this is a better solution. In essence, up until the first arbitrary assignment, all is fine. Once you hit the first assignment, though, as this reply states, an earlier overlapping range will just break the answer. As a final remark, not only did the accepted answer fail to explain why a particular sorting order is in favor, it also gave the most stereotypical mathematical proof that makes almost no difference if applied on wrong preconditions.Heisel

© 2022 - 2024 — McMap. All rights reserved.