Finding rectangles in a 2d block grid
Asked Answered
M

4

15

Let's say I have a grid of blocks, 7x12. We use the colors '*','%','@' and an empty cell '-'.

1 2 3 4 5 6 7
- - - - - - -  1
- - - - - - -  2
% % - - - - -  3
% % - - - - *  4 
% % - - - @ %  5
@ @ @ - - @ %  6
@ @ * * * - *  7
* * * % % % %  8 
% @ @ % * * %  9
% @ % % % % %  10
* * * % % @ @  11
* * @ @ @ @ *  12

I want to find rectangles in this grid of a certain minimum size, and the biggest I can find and then smaller until no rectangles greater or equal to the minimum size can be found.

In this example, consider the minimum size 1x4, 4x1, 2x2 so a 1x3 is not valid but a 2x3 is. If we want the biggest rectangles we find the following:

  • 4x1 at (4,8)
  • 5x1 at (3,10)
  • 2x3 at (1,3)
  • 2x2 at (6,1)
  • 2x2 at (1,11)
  • 4x1 at (3,12)

Note that rectangles cannot be in each others space, they cannot overlap. For example the 2x2 rectangle at (4,10) is not mentioned because it would overlap the 5x1 rectangle at (3,10).

All are perfectly valid rectangles: they are equal or greater that the minimum size and all the blocks per rectangle are of the same color.

What I want is to do this programmatically. When you tell someone to find rectangles in a grid, he finds them immediatly, without any thinking about it. The question is, how can I write an algoritm that does the same?

I considered bruteforcing but I need the algorithm to execute as fast as possible as it will need to be executed a lot in a very small time frame on a limited (mobile) device.

I see a lot of questions on the internet about rectangles, but I'm suprised this one hasn't been asked anywhere yet. Am I thinking too difficult or has no one ever wanted to do something like this?

Maineetloire answered 27/4, 2011 at 21:11 Comment(4)
Can rectangles cross? I see you didn't mention 2x2 at (4,10), is it because it has a common part with a bigger 5x1 at (3,10)?Keaton
No, rectangles cannot cross each other. Each rectangle cannot be in the space of others.Maineetloire
@Sebeazz wouldn't that potentially disqualify rectangles based on what order they were found in?Biparietal
@GlowCoder: No, because I want to find the biggest first.Maineetloire
B
12

Call the width and height of the input array W and H respectively.

  1. Run this clever O(WH) algorithm for determining the largest rectangle, but instead of tracking just the single largest rectangle, for each (x, y) location record in a W*H matrix the width and height of (one or all of) the largest rectangles whose top-left corner is (x, y), updating these values as you go.
  2. Loop through this matrix, adding each sufficiently-large rectangle in it to a max-heap ordered by area (width * height).
  3. Read entries out of this heap; they will be produced in decreasing area order. With every entry read whose top-left corner is (x, y) and which has width w and height h, mark each of the wh locations included in the rectangle as "used" in a WH bit array. When reading rectangles from the heap, we must discard any rectangles that contain "used" squares to avoid producing overlapping rectangles. It's sufficient to check just the four edges of each candidate rectangle against the "used" array, since the only other way that the candidate rectangle could overlap another rectangle would be if the latter rectangle was completely contained by it, which is impossible due to the fact that we are reading rectangles in decreasing area order.

This approach is "greedy" insofar as it won't guarantee to choose the largest sequence of rectangles overall if there are multiple ways to carve a solid coloured region into maximal rectangles. (E.g. it might be that there are several rectangles whose top-left corner is at (10, 10) and which have an area of 16: 16x1, 8x2, 4x4, 2x8, 1x16. In this case one choice might produce bigger rectangles "downstream" but my algorithm doesn't guarantee to make that choice.) If necessary you could find this overall optimal series of rectangles using backtracking, though I suspect this could be very slow in the worst case.

The maximum-rectangle algorithm I mention is designed for single-colour rectangles, but if you can't adapt it to your multi-colour problem you can simply run it once for each colour before starting step 2.

Burkhart answered 28/4, 2011 at 2:44 Comment(7)
Would you mind updating the link? It doesn't seem to work anymore.Postconsonantal
@JanBerktold: At first I thought so too -- it timed out. But then I found a different URL for the same article: drdobbs.com/database/the-maximal-rectangle-problem/…. This brought up the article after about 30s... and a second attempt on the original URL worked too! So I'm leaving the link as-is.Burkhart
Can you explain further @Burkhart how to do what you describe in step one? I'm not sure A) how to provide an x/y location to the algorithm, B) where to grab the data for each rectangle from the algorithm - I just see the max / max area.Esker
@ZacharyCarter: Whenever the algorithm chooses a maximum of several values to solve a particular subproblem, you should record in a "predecessor" array which choice this was. Then once you have solved the entire problem, you can trace backwards through this array to reconstruct an optimal solution.Burkhart
@Burkhart both links are still broken for mePiper
@AlexReinking: Same for me, just now. I wasn't able to find another copy of that article (in fact Google's top result for "dr dobbs maximal rectangle problem" is still the same, now-useless link!) but it seems the author of the article wrote an answer to a very similar question, and I've asked him if he has another copy online somewhere. Fingers crossed...Burkhart
As is often the case, Archive.org has a copy -- web.archive.org/web/20150921112543/http://www.drdobbs.com/…Piper
B
0

Note: this operates under the assumption that you're trying to find the biggest k rectangles.

We know we must, in the worst case, look at every node in the grid at least once. This means our best-case worst-cast is O(len*wid).

Your brute-force is going to be O(len*len*wid*wid) with the naive approach of "Checking for rectangles at a point is O(len*wid), and you do that O(len*wid) times.

It may be that you find this to not be the case, as each time you find a rectangle, you have potential to reduce the problem space. A brute force approach of "check each rectangle" I feel is going to be the best approach. There are things you can do to speed it up, though.

Basic algorithm:

for(x = 1 .. wid) {
    for(y = 1 .. len) {
        Rectangle rect = biggestRectOriginatingAt(x,y);
        // process this rectangle for being added
    }
}
  • Keep track of the largest k rectangles. As you go along, you can search the perimeter of where an eligible rectangle could possibly be.

    Rectangle biggestRectOriginatingAt(x,y) {
        area = areaOf(smallestEligibleRectangle); // if we want the biggest k rect's, this
                                                  // returns the area of the kth biggest
                                                  // known rectangle thus far
    
        for(i = 1 .. area) {
            tempwid = i
            templen = area / i
    
            tempx = x + tempwid
            tempy = y + templen
    
            checkForRectangle(x,y,tempx,tempy); // does x,y --> tempx,tempy form a rectangle?
        }
    
    }
    

This allows you to get big performance gains towards the end of your large searches (if it's a small search, you don't gain as much but you don't care because it's a small search!)

This also doesn't work as well for more random distrobutions.

  • Another optimization is to use a paint-fill algorithm to find the largest consecutive areas. This is O(len*wid), which is a small cost. This will allow you to search the most likely areas for a large rectangle to be.

Note that neither of these approaches reduce the worst case. But, they do reduce the real-world expected running time.

Biparietal answered 27/4, 2011 at 21:33 Comment(0)
F
0

I had to solve a very similar problem for my first person shooter. I use that in input:
[  ][  ][  ][  ][  ][  ][  ][  ]
[  ][  ][  ][X][  ][  ][  ][  ]
[  ][X][X][X][X][X][X][X]
[  ][  ][X][X][X][X][  ][  ]
[  ][X][X][X][X][  ][  ][  ]
[  ][X][X][X][X][  ][  ][  ]
[  ][  ][X][  ][  ][  ][  ][  ]
[  ][  ][  ][  ][  ][  ][  ][  ]

I get that in output:
[  ][  ][  ][  ][  ][  ][  ][  ]
[  ][  ][  ][A][  ][  ][  ][  ]
[  ][B][G][G][G][F][E][E]
[  ][  ][G][G][G][F][  ][  ]
[  ][D][G][G][G][  ][  ][  ]
[  ][D][G][G][G][  ][  ][  ]
[  ][  ][C][  ][  ][  ][  ][  ]
[  ][  ][  ][  ][  ][  ][  ][  ]

This schema is better. The source code (under GNU General Public License version 2) is here, it is heavily commented. You may have to adapt it a bit to your needs like the one suggested by j_random_hacker.

Fives answered 10/12, 2012 at 13:53 Comment(3)
Where exactly is the relevant code? The sourceforge folder is quite big.Postconsonantal
Jan, I've just fixed the URL, the code containing the relevant code is in ArrayHelper.computeFullArraysFromNonFullArray().Fives
@JanBerktold I have commented absolutely everything in this class and I have created a unit test that used square and rectangular arrays in input. It's not very fast but it works :)Fives
D
0

My own solution is to find the biggest rectangle, using the same algorithm as in @j_random_hacker answer, then split the remaining area into 4 regions, and recursively search for the biggest rectangle again in each of these regions.

Link to C++ sources

It will find less rectangles than the accepted answer, because I have found it difficult to adopt that algorithm to save each intermediary rectangle when searching for the biggest one. The algorithm skips all smaller rectangles, so we have to iterate through every point in our grid, to save every rectangle possible, then discard smaller ones, and this bumps the algorithm back to O(M³ ⋅ N³) complexity.

We can split the remaining area in two ways, the algorithm will check both, and will use the option which covers the most area, so it will perform the recursive call twice - first time to calculate the area, second time to fill the output array.

    ****|***|***                ************
    ****|***|***                ************
    ****#####***                ----#####---
    ****#####***        vs      ****#####***
    ****#####***                ----#####---
    ****|***|***                ************

We can leave only one choice of area split to make the algorithm run faster, because this area comparison does not provide much improvement to the amount of detected rectangles, to be honest.

Edit: I just realized that recursively checking both splitting variants raises the algorithm to factorial complexity, to something like O(min(M,N)!). So I've disabled the second area split, which leaves the algorithm with complexity around O(M⋅N⋅log(M⋅N)).

Delaminate answered 27/5, 2016 at 14:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.