How to find same-value rectangular areas of a given size in a matrix most efficiently?
Asked Answered
C

3

9

My problem is very simple but I haven't found an efficient implementation yet.

Suppose there is a matrix A like this:

0 0 0 0 0 0 0
4 4 2 2 2 0 0
4 4 2 2 2 0 0
0 0 2 2 2 1 1
0 0 0 0 0 1 1

Now I want to find all starting positions of rectangular areas in this matrix which have a given size. An area is a subset of A where all numbers are the same.

Let's say width=2 and height=3. There are 3 areas which have this size:

2 2   2 2   0 0
2 2   2 2   0 0
2 2   2 2   0 0

The result of the function call would be a list of starting positions (x,y starting with 0) of those areas.

List((2,1),(3,1),(5,0))

The following is my current implementation. "Areas" are called "surfaces" here.

case class Dimension2D(width: Int, height: Int)
case class Position2D(x: Int, y: Int)

def findFlatSurfaces(matrix: Array[Array[Int]], surfaceSize: Dimension2D): List[Position2D] = {

    val matrixWidth = matrix.length
    val matrixHeight = matrix(0).length
    var resultPositions: List[Position2D] = Nil

    for (y <- 0 to matrixHeight - surfaceSize.height) {
        var x = 0
        while (x <= matrixWidth - surfaceSize.width) {
            val topLeft = matrix(x)(y)
            val topRight = matrix(x + surfaceSize.width - 1)(y)
            val bottomLeft = matrix(x)(y + surfaceSize.height - 1)
            val bottomRight = matrix(x + surfaceSize.width - 1)(y + surfaceSize.height - 1)
            // investigate further if corners are equal
            if (topLeft == bottomLeft && topLeft == topRight && topLeft == bottomRight) {
                breakable {
                    for (sx <- x until x + surfaceSize.width;
                         sy <- y until y + surfaceSize.height) {
                        if (matrix(sx)(sy) != topLeft) {
                            x = if (x == sx) sx + 1 else sx 
                            break
                        }
                    }
                    // found one!       
                    resultPositions ::= Position2D(x, y)
                    x += 1
                }
            } else if (topRight != bottomRight) {
                // can skip x a bit as there won't be a valid match in current row in this area
                x += surfaceSize.width 
            } else {
                x += 1
            }
        }   
    }
    return resultPositions
}

I already tried to include some optimizations in it but I am sure that there are far better solutions. Is there a matlab function existing for it which I could port? I'm also wondering whether this problem has its own name as I didn't exactly know what to google for.

Thanks for thinking about it! I'm excited to see your proposals or solutions :)

EDIT: Matrix dimensions in my application range from 300x300 to 3000x3000 approximately. Also, the algorithm will only be called once for the same matrix. The reason is that the matrix will always be changed afterwards (approx. 1-20% of it).

RESULTS

I implemented the algorithms of Kevin, Nikita and Daniel and benchmarked them in my application environment, i.e. no isolated synthetic benchmark here, but special care was taken to integrate all algorithms in their most performant way which was especially important for Kevin's approach as it uses generics (see below).

First, the raw results, using Scala 2.8 and jdk 1.6.0_23. The algorithms were executed several hundred times as part of solving an application-specific problem. "Duration" denotes the total time needed until the application algorithm finished (of course without jvm startup etc.). My machine is a 2.8GHz Core 2 Duo with 2 cores and 2gig of memory, -Xmx800M were given to the JVM.

IMPORTANT NOTE: I think my benchmark setup is not really fair for parallelized algorithms like the one from Daniel. This is because the application is already calculating multi-threaded. So the results here probably only show an equivalent to single-threaded speed.

Matrix size 233x587:

                  duration | JVM memory | avg CPU utilization
original O(n^4) | 3000s      30M          100%  
original/-server| 840s       270M         100%
Nikita O(n^2)   | 5-6s       34M          70-80%
Nikita/-server  | 1-2s       300M         100%
Kevin/-server   | 7400s      800M         96-98%
Kevin/-server** | 4900s      800M         96-99%
Daniel/-server  | 240s       360M         96-99%

** with @specialized, to make generics faster by avoiding type erasure

Matrix size 2000x3000:

                  duration | JVM memory | avg CPU utilization
original O(n^4) | too long   100M         100%  
Nikita O(n^2)   | 150s       760M         70%
Nikita/-server  | 295s (!)   780M         100%
Kevin/-server   | too long, didn't try

First, a small note on memory. The -server JVM option uses considerably more memory at the advantage of more optimizations and in general faster execution. As you can see from the 2nd table Nikita's algorithm is slower with the -server option which is obviously due to hitting the memory limit. I assume that this also slows down Kevin's algorithm even for the small matrix as the functional approach is using much more memory anyways. To eliminate the memory factor I also tried it once with a 50x50 matrix and then Kevin's took 5secs and Nikita's 0secs (well, nearly 0). So in any case it's still slower and not just because of memory.

As you can see from the numbers, I will obviously use Nikita's algorithm because it's damn fast and this is absolutely necessary in my case. It can also be parallelized easily as Daniel pointed out. The only downside is that it's not really the scala-way.

At the moment Kevin's algorithm is probably in general a bit too complex and therefore slow, but I'm sure there are more optimizations possible (see last comments in his answer).

With the goal of directly transforming Nikita's algorithm to functional style Daniel came up with a solution which is already quite fast and as he says would even be faster if he could use scanRight (see last comments in his answer).

What's next?

At the technological side: waiting for Scala 2.9, ScalaCL, and doing synthetic benchmarks to get raw speeds.

My goal in all this is to have functional code, BUT only if it's not sacrificing too much speed.

Choice of answer:

As for choosing an answer, I would want to mark Nikita's and Daniel's algorithms as answers but I have to choose one. The title of my question included "most efficiently", and one is the fastest in imperative and the other in functional style. Although this question is tagged Scala I chose Nikita's imperative algorithm as 2s vs. 240s is still too much difference for me to accept. I'm sure the difference can still be pushed down a bit, any ideas?

So, thank you all very very much! Although I won't use the functional algorithms yet, I got many new insights into Scala and I think I slowly get an understanding of all the functional crazyness and its potential. (of course, even without doing much functional programming, Scala is much more pleasing than Java... that's another reason to learn it)

Chamade answered 11/1, 2011 at 10:39 Comment(9)
There are a couple of algorithms to find regions (like in the Paint program): Flood fill Region extraction. But they don't impose a rectangular pattern. Kevin's answer looks very good for this use case.Entomologize
@neo, just for curiousity, how big are the matrices you need to process?Gadgetry
@Paul in the range from approx. 300x300 to 3000x3000, that's why I'm really after the most efficient algorithm. I'm interested in ScalaCL too, but unfortunately my gfx card is too old for that...Chamade
I removed my answer, because I noticed a serious flaw in the algorithm. I'll let Kevin take this one. :-)Discernible
For whatever it is worth, Nikita's solution is easy to parallelize. The first computation, consecutive cells, can be done independently for each column. The second computation, shown in the answer, can be computed independently for each row. If you have a memory barrier between the two steps, that's all the synchronization you need to multithread it.Discernible
@neo If your matrix is composed of Int or Double, then one HUGE speed problem with Kevin's code is that it is parameterized. If you go through it and replace T with the actual type you are using (and removing the [T] type parameter declarations) you'll get a much faster code. You can do the same by sprinkling the code with @specialized annotations in the proper places too, which makes the code faster for AnyVal subclasses, yet still flexible.Discernible
@daniel it would have to be specialization. In the column pass, those methods are being called with T=(Int,Int)Mcdade
@Daniel I didn't expect parameterization to be a speed problem, I read up on this topic now at lamp.epfl.ch/~dragos/files/scala-spec.pdf I will update the benchmark results of Kevin's algorithm.Chamade
@Daniel Benchmark's updated, to be honest I expected a bit more... which may mean that the main work is not un/boxing but creating many temporary lists.Chamade
C
5

You can do it in O(n^2) relatively easy.
First, some-precalculations. For each cell in matrix, calculate how many consecutive cells below it have the same number.
For your example, resulting matrix a (couldn't think of better name :/) will look like this

0 0 0 0 0 2 2
1 1 2 2 2 1 1
0 0 1 1 1 0 0
1 1 0 0 0 1 1
0 0 0 0 0 0 0

It can be produced in O(n^2) easily.

Now, for each row i, let's find all rectangles with top side in row i (and bottom side in row i + height - 1).
Here's an illustration for i = 1

0 0 0 0 0 0 0
-------------
4 4 2 2 2 0 0
4 4 2 2 2 0 0
0 0 2 2 2 1 1
-------------
0 0 0 0 0 1 1

Now, the main idea

int current_width = 0;
for (int j = 0; j < matrix.width; ++j) {
    if (a[i][j] < height - 1) {
        // this column has different numbers in it, no game
        current_width = 0;
        continue;
    }

    if (current_width > 0) {
        // this column should consist of the same numbers as the one before
        if (matrix[i][j] != matrix[i][j - 1]) {
            current_width = 1; // start streak anew, from the current column
            continue;
        }
    }

    ++current_width;
    if (current_width >= width) {
        // we've found a rectangle!
    }
}

In the example above (i = 1) current_width after each iteration will be 0, 0, 1, 2, 3, 0, 0.

Now, we need to iterate over all possible i and we have a solution.

Coachman answered 11/1, 2011 at 11:36 Comment(29)
Argh... Java! Unclean, Unclean!Mcdade
This is close enough to what I was thinking that I probably wont bother.Dilantin
@Kevin It could as well be C or C++ or any of their family of languages :) Anyway, it should be easily convertable to any language, the only C-specific construct here is a 'for' loop. Just consider it pseudo-code.Coachman
@Nikita Honestly, there's a reason why people use these tags when asking questions! If nothing else, your use of the current_width mutable variable totally flies in the face of idiomatic Scala, and means that this solution can't be scaled to use the parallel features shortly due to be released with version 2.9. Pushing a serial and highly imperative algorithm like this on someone looking to grow their knowledge of a functional programming language is somewhat inappropriate.Mcdade
@Kevin I'm sorry if decent O(n^2) solution can't be implemented in Scala due it its 'features'. And I'm sorry that for decades in CS/algorithm works imperative pseudo-code is used almost exclusively. Hope you'll ever forgive me.Coachman
@Kevin As for 'growing', OP didn't specify in which area he wanted to 'grow': algorithms or Scala. But staying with brute-force O(n^4) solution surely doesn't look like growth to me.Coachman
@Nikita: My GPU has 256 cores... Moore's law tells us that the need for parallelization in software will double every 18 months. Individual cores are set to get slower for reasons of thermal/power efficiency. Big-O is inherently a measurement of serial complexity and doesn't account for this, nor does it recognise performant attributes of an algorithm such as cache coherency. An immutable, declarative solution can process every row, simultaneously, followed by every column (simultaneously) using something like this compiler plugin: code.google.com/p/scalacl/wiki/ScalaCLPluginMcdade
"Big-O is inherently a measurement of serial complexity" No, it's not. It's a measurement of the required number of operations. With a fixed number of cores, an O(n^4) algorithm is still O(n^4). And an immutable, declarative solution cannot every row simultaneously unless the number of cores is greater than or equal to n, unless I'm missing somethingGadgetry
@Kevin That reminds me of another argument I've heard long ago. "We don't need to learn counting: in future computers will do it for us!" :) And btw, this algorithm can be parallelized as well: processing of different rows can be done simultaneously. (In case of a computation - columns)Coachman
@Nikita, why not also produce a matrix of how many cells to the right have the same number? Then finding the tiles is just traversing the two matrices doing two comparisons for width and height (and that bit could be done functionally :-P)Gadgetry
@Paul Of course it could, you basically just summarised the algorithm in my answer :)Mcdade
@Paul I don't know about 'two comparisons'. You can check top and left borders of rectangle this way, but not insides (if I understand you correctly). Anyway, I do believe this can be rewritten as some kind of recursion in scala, only it would be ugly as hell :)Coachman
@Nikita sigh, no, I don't expect computers to count for us. But I do expect us to stop focusing on outdated methods of complexity measurement that don't allow for a certain amount of duplicated effort in order to massively improve the concurrency of an algorithm. The relevant exponential measurement here is core count vs time.Mcdade
@Kevin, Big-O is not outdated by concurrency. You're talking about the constant factor, not the Big-O behaviour, which remains relevant even if for some dimensions a worse big-O is outweighed by a better constant. That's as true for parallel solutions as it is for serial ones.Gadgetry
@Nikita the "two comparisons" that Paul referred two would be the row comparison, then a column comparison of (cell value, row comparison) tuples. I know this, because it's exactly the algorithm that I wrote. I use the same runLengths method for both rows and columns, this method being a non-ugly example of "some kind of recursion in Scala".Mcdade
@Kevin I wasn't talking about counting, but about predicting future. With average consumer laptop having 2-4 cores (the number which didn't seem to respect Moore's law recently), times when difference between 10^6 operations and 10^12 operations isn't important aren't coming soon. I can't believe I even have to argue about this. (Not to mention, there's absolutely nothing that prevents parallelization in Java or C, as I noted above.)Coachman
@Kevin It's nice that you explain your idea here, but you'd surely get more attention (and votes) if you did it in your post (preferably with some pseudo-code and without Scala-specific syntax). Scala isn't widely spoken here. Sad truth.Coachman
@Nikita When the Question is tagged both with "scala" and "scala 2.8", you want to believe that it's spoken around here... I suggest you go back and read my post wherein I do describe how it's working, how did you feel able to defame it as O(n^4) if you don't even know what it does?Mcdade
@Nikita incidentally, the average laptop has way more than 2 cores, you're forgetting about the GPU...Mcdade
@Kevin Now, that's funny. I noted that OP's code has O(n^4) complexity and you started proving that there's nothing wrong with O(n^4). I have no idea about your solution's complexity, although I did try to understand it.Coachman
@Kevin And how do you use GPU or videcard for algorithmic computations? I'm really interested.Coachman
@Kevin What exactly are we arguing about, anyway?Coachman
@Nikita I believe we're arguing over the relative importance of concurrent scalability vs minimising serial run-time for an algorithm... Also, here's the compiler plugin for running (some) scala code on the GPU: code.google.com/p/scalacl/wiki/ScalaCLPluginMcdade
@Nikita Thanks for your solution! I just benchmarked it and it outperforms mine by miles. Memory footprint is obviously higher due to the creation of matrix A but this is acceptable. I haven't implemented Kevin's approach yet, so this is what I'll do next.Chamade
@Nikita lol.... I just made a bigger benchmark, and when I use yours in my application (several hundred times) it takes 6-7secs in total, and my original solution takes 3065secs :D So yours is 400-500x faster... can this really be?? Of course these measurements are by no means scientific, but still gives an ideaChamade
@neo, 500 is less than 25^2. So a O(N^2) algorithm can be 500 times faster than an O(N^4) for N as small as 25, other things being equal. Those powers get big quickly. for N = 3000 (your bigger matrix) it could be 9 million times faster!Gadgetry
@Nikita By n^2, do you mean in a n x n matrix?Discernible
@Daniel Yes. You could say O(W*H), but it's not likely you'll have different restrictions for W and H.Coachman
@Nikita I don't know how I could've overseen it during all my testing etc. but there is a very little error in your code logic. The third current_width = 0; where it checks for same numbers has to be = 1 !!! Otherwise it always skips one "good" column (the check whether its height is big enough came earlier, so that's ok -- doesn't have to check again and can simply set it to 1)Chamade
M
8

First, a couple of helper functions:

//count the number of elements matching the head
def runLength[T](xs:List[T]) = xs.takeWhile(_ == xs.head).size

//pair each element with the number of subsequent occurrences
def runLengths[T](row:List[T]) : List[(T,Int)] = row match {
  case Nil => Nil
  case h :: t => (h, runLength(row)) :: runLengths(t)
}
//should be optimised for tail-call, but easier to understand this way

//sample input: 1,1,2,2,2,3,4,4,4,4,5,5,6
//output: (1,2), (1,1), (2,3), (2,2), (2,1), (3,1), (4,4), (4,3), (4,2), (4,1), (5,2), (5,1), (6,1)

This can then be used against each row in the grid:

val grid = List(
  List(0,0,0,0),
  List(0,1,1,0),
  List(0,1,1,0),
  List(0,0,0,0))

val stage1 = grid map runLengths
// returns stage1: List[List[(Int, Int)]] =
// 0,4  0,3  0,2  0,1
// 0,1  1,2  1,1  0,1
// 0,1  1,2  1,1  0,1
// 0,4  0,3  0,2  0,1

Then having done the horizontal, the rows, we now perform exactly the same operation on the columns. This uses the transpose method available in the Scala standard collection library to exchange rows<->columns, as per the mathematical matrix operation of the same name. We also transpose back once this is done.

val stage2 = (stage1.transpose map runLengths).transpose
// returns stage2: List[List[((Int, Int), Int)]] =
// (0,4),1  (0,3),1  (0,2),1  (0,1),4
// (0,1),2  (1,2),2  (1,1),2  (0,1),3
// (0,1),1  (1,2),1  (1,1),1  (0,1),2
// (0,4),1  (0,3),1  (0,2),1  (0,1),1

What does this mean? Taking one element: (1,2),2, it means that that cell contains the value 1, and scanning to the right that there are 2 adjacent cells in the row containing 1. Scanning down, there are two adjacent cells with the same property of containing the value 1 and having the same number of equal values to their right.

It's a little clearer after tidying up, converting nested tuples of the form ((a,b),c) to (a,(b,c)):

val stage3 = stage2 map { _.map {case ((a,b),c) => a->(b,c) } }
//returns stage3: List[List[(Int, (Int, Int))]] =
//  0,(4,1)  0,(3,1)  0,(2,1)  0,(1,4)
//  0,(1,2)  1,(2,2)  1,(1,2)  0,(1,3)
//  0,(1,1)  1,(2,1)  1,(1,1)  0,(1,2)
//  0,(4,1)  0,(3,1)  0,(2,1)  0,(1,1)

Where 1,(2,2) refers to a cell containing the value 1, and being at the top-left corner of a 2x2 rectangle of similarly-valued cells.

From here, it's trivial to spot rectangles of the same size. Though a little more work is necessary if you also want to exclude areas that are the subset of a larger rectangle.

UPDATE: As written, the algorithm doesn't work well for cases like the cell at (0,0), which belongs to two distinct rectangles (1x4 and 4x1). If needed, this is also solvable using the same techniques. (do one pass using map/transpose/map/transpose and a second with transpose/map/transpose/map, then zip and flatten the results).

It would also need modifying if the input might contain adjoining rectangles containing cells of the same value, such as:

0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0
0 0 1 1 1 0 0 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 0 0 0 0 0 0

Putting it all together, and cleaning up a little:

type Grid[T] = List[List[T]]

def runLengths[T](row:List[T]) : List[(T,Int)] = row match {
  case Nil => Nil
  case h :: t => (h -> row.takeWhile(_ == h).size) :: runLengths(t)
}

def findRectangles[T](grid: Grid[T]) = {
  val step1 = (grid map runLengths)
  val step2 = (step1.transpose map runLengths).transpose
  step2 map { _ map { case ((a,b),c) => (a,(b,c)) } }
}

UPDATE2

Hold onto your hats, this is a big one...

Before writing a single line of new functionality, we'll first refactor a bit, pulling some of the methods into Ops classes with implicit conversions, so they can be used as though they were methods defined on the underlying collection types:

import annotation.tailrec

class RowOps[T](row: List[T]) {
  def withRunLengths[U](func: (T,Int)=>U) : List[U] = {
    @tailrec def recurse(row:List[T], acc:List[U]): List[U] = row match {
      case Nil => acc
      case head :: tail =>
        recurse(
          tail,
          func(head, row.takeWhile(head==).size) :: acc)
    }
    recurse(row, Nil).reverse
  }

  def mapRange(start: Int, len: Int)(func: T=>T) =
    row.splitAt(start) match {
      case (l,r) => l ::: r.take(len).map(func) ::: r.drop(len)
    }
}

implicit def rowToOps[T](row: List[T]) = new RowOps(row)

This adds a withRunLengths method to lists. One notable difference here is that instead of returning a List of (value, length) pairs the method accepts, as a parameter, a function to create an output value for each such pair. This will come in handy later...

type Grid[T] = List[List[T]]

class GridOps[T](grid: Grid[T]) {
  def deepZip[U](other: Grid[U]) = (grid zip other) map { case (g,o) => g zip o}
  def deepMap[U](f: (T)=>U) = grid map { _ map f}
  def mapCols[U](f: List[T]=>List[U]) = (grid.transpose map f).transpose
  def height = grid.size
  def width = grid.head.size
  def coords = List.tabulate(height,width){ case (y,x) => (x,y) }
  def zipWithCoords = deepZip(coords)
  def deepMapRange(x: Int, y: Int, w: Int, h: Int)(func: T=>T) =
    grid mapRange (y,h){ _.mapRange(x,w)(func) }
}

implicit def gridToOps[T](grid: Grid[T]) = new GridOps(grid)

There shouldn't be any surprises here. The deepXXX methods avoid having to write constructs of the form list map { _ map { ... } }. tabulate may also be new to you, but its purpose is hopefully obvious from the use.

Using these, it becomes trivial to define functions for finding the horizontal and vertical run lengths over a whole grid:

def withRowRunLengths[T,U](grid: Grid[T])(func: (T,Int)=>U) =
  grid map { _.withRunLengths(func) }

def withColRunLengths[T,U](grid: Grid[T])(func: (T,Int)=>U) =
  grid mapCols { _.withRunLengths(func) }

Why 2 parameter blocks and not one? I'll explain shortly.

I could have defined these as methods in GridOps, but they don't seem appropriate for general purpose use. Feel free to disagree with me here :)

Next, define some input...

def parseIntGrid(rows: String*): Grid[Int] =
  rows.toList map { _ map {_.toString.toInt} toList }

val input: Grid[Int] = parseIntGrid("0000","0110","0110","0000")

...another useful helper type...

case class Rect(w: Int, h: Int)
object Rect { def empty = Rect(0,0) }

as an alternative to a tuple, this really helps with debugging. Deeply nested parenthesis are not easy on the eyes. (sorry Lisp fans!)

...and use the functions defined above:

val stage1w = withRowRunLengths(input) {
  case (cell,width) => (cell,width)
}

val stage2w = withColRunLengths(stage1w) {
  case ((cell,width),height) => Rect(width,height)
}


val stage1t = withColRunLengths(input) {
 case (cell,height) => (cell,height)
}

val stage2t = withRowRunLengths(stage1t) {
  case ((cell,height),width) => Rect(width,height)
}

All of the above blocks should be one-liners, but I reformatted for StackOverflow.

The outputs at this stage are just grids of Rectangles, I've intentionally dropped any mention of the actual value that the rectangle is comprised of. That's absolutely fine, it's easy enough to find from its co-ordinates in the grid, and we'll be recombining the data in just a brief moment.

Remember me explaining that RowOps#withRunLengths takes a function as a parameter? Well, this is where it's being used. case (cell,width) => (cell,width) is actually a function that takes the cell value and run length (calling them cell and width) then returns the (cell,width) Tuple.

This is also why I used two parameter blocks in defining the functions, so the second param can be passed in { braces }, and makes the whole thing all nice and DSL-like.

Another very important principle illustrated here is that the type inferencer works on parameter blocks in succession, so for this (remember it?):

def withRowRunLengths[T,U](grid: Grid[T])(func: (T,Int)=>U)

The type T will be determined by the supplied grid. The compiler then knows the input type for the function supplied as the second argument, - Int in this case, as the first param was a Grid[Int] - which is why I was able to the write case (cell,width) => (cell,width) and not have to explicitly state anywhere that cell and width are integers. In the second usage, a Grid[(Int,Int)] is supplied, this fits the closure case ((cell,width),height) => Rect(width,height) and again, it just works.

If that closure had contained anything that wouldn't work for the underlying type of the grid then the compiler would have complained, this is what type safety is all about...

Having calculated all the possible rectangles, all that remains is to gather up the data and present it in a format that's more practical for analysing. Because the nesting at this stage could get very messy, I defined another datatype:

case class Cell[T](
  value: T,
  coords: (Int,Int) = (0,0),
  widest: Rect = Rect.empty,
  tallest: Rect = Rect.empty
)

Nothing special here, just a case class with named/default parameters. I'm also glad I had the foresight to define Rect.empty above :)

Now mix the 4 datasets (input vals, coords, widest rects, tallest rects), gradually fold into the cells, stir gently, and serve:

val cellsWithCoords = input.zipWithCoords deepMap {
  case (v,(x,y)) => Cell(value=v, coords=(x,y))
}

val cellsWithWidest = cellsWithCoords deepZip stage2w deepMap {
  case (cell,rect) => cell.copy(widest=rect)
}

val cellsWithWidestAndTallest = cellsWithWidest deepZip stage2t deepMap {
  case (cell,rect) => cell.copy(tallest=rect)
}

val results = (cellsWithWidestAndTallest deepMap {
  case Cell(value, coords, widest, tallest) =>
    List((widest, value, coords), (tallest, value, coords))
  }
).flatten.flatten

The last stage is interesting there, it converts each cell to a size-2 list of (rectangle, value, coords) tuples (size 2 because we have both widest and tallest rects for each cell). Calling flatten twice then takes the resulting List[List[List[_]]] back down to a single List. There's no need to retain the 2D structure any more, as the necessary coordinates are already embedded in the results.

Voila!

It's now up to you what you now do with this List. The next stage is probably some form of sorting and duplicate removal...

Mcdade answered 11/1, 2011 at 10:40 Comment(23)
Hmmm.... don't know if it's just the Scala plugin for eclipse but it doesn't let me compile it because of this error: "too many arguments for method map2dRowFirst: (grid: HelpersTest.this.Grid[T])(func: (List[T]) => List[T])List[List[T]]" I haven't understood yet what the "2nd" parameter list in map2dRowFirst is good for, I mean in general, not just in your code. Can you shed some light on this?Chamade
@Kevin You wrote that your code would need modifying if the input contains adjoining rectangles. As this is absolutely possible in my case, can you give me a hint on this one?Chamade
@neo my mistake, I copied mismatched versions of the functions from the REPL. It should all be good now.Mcdade
@neo adjoining rectangles are mostly tricky because they can be perceived in different ways. For example, would you say that the sample I gave in my answer represents 2 overlapping rectangles, or 3 that don't overlap?Mcdade
@Kevin now it complains "type mismatch; found : (List[Nothing]) => List[(Nothing, Int)] required: (List[Any]) => List[Any]" for the runLengths parameter in the last line, which seems correct, as List[(T,Int)] is not List[T]Chamade
@Kevin regarding adjoining rectangles, it depends of the searched size of course. For my purposes I need overlapped rectangles as well. So if width should be 3 and height 2, then there are 8 rectangles. Or for width 3 and height 3 there are 5.Chamade
@neo all fixed now. Shame I left my repl session on the other computer, I'd life to go back and check what I left out.Mcdade
@Kevin just one more question, I tried to implement the two passes and wanted to zip and flatten the resulting lists, but I don't really know how to do it right... I tried it with List.flatten(a zip b) where a and b are final lists. I get for flatten: "type mismatch; found : List[(List[(T, (Int, Int))], List[(T, (Int, Int))])] required: List[List[?]]"Chamade
@Kevin I understood now why the algorithm in its current form doesn't support overlapping rectangles. Do you see any easy way or would it then be better to have a more different approach?Chamade
@Neo you need to run it twice, swapping step1 and step2 the second time around. This'll give you 2 rectangles for each cell: the tallest, and the widest. I'll update more fully later today.Mcdade
@Neo Enjoy! I'd be interested to see how it performs, recommend that you run the latest JVM with the -server flag to turn on escape analysis.Mcdade
@Kevin Thanks for your effort! I really appreciate it and will try to understand every line. I nearly integrated your algorithm but as I had to use List instead of Array I came to a problem which is the only missing link before I can do benchmarks. I hope you can help me (probably a one or two-liner). Goal: For a chosen rectangle, say size 3x4 at x=0, y=1 with value=1 I now need to add a number, say 10, to all cells of this rectangle in the grid (value afterwards=1+10=11). Before, I used a nested x-y for-loop due to easy O(1) access on Arrays, but how to do it for lists? Thanks so much :)Chamade
@Kevin I didn't know the -server flag, seems to really speed up all algorithms (my original one, the one from Nikita, and yours probably too when I'll benchmark it later). The only "downside" is memory consumption which might be a problem for my 2GB memory workstation, but we'll see.Chamade
@neo adding values like that is not trivial. A 4x5 rectangle would internally contain 4 3x4 rectangles. How would you want to increment the values in such a case?Mcdade
@Kevin I'm not always directly referring to one of the widest or tallest rectangles found by your algorithm, but rather to a specific sub-rectangle of it (which might be the same). And all cells in this sub-rectangle have the same value V which should be changed to V+Z. See here for visualization: pastebin.com/fLRFgFERChamade
@Neo try using the deepMapRange method on GridOps, just added to my answer. e.g. input.deepMapRange(2,3,3,3){ x=>5 }Mcdade
@Kevin Thanks! :) I'm just integrating it, I think you missed a dot in deepMapRange between "grid mapRange(y,h)....", at least Eclipse complains without one but don't know why infix notation shouldn't work here, maybe because of currying?Chamade
Kevin, your run length method is increasing complexity unnecessarily. A simple row.init.foldRight((row.last, List(1))) { case (curr, (prev, acc @ (head :: _))) => (curr, (if (curr == prev) head + 1 else 1) :: acc) }._2 would compute it much more efficiently, though reversing row first and using foldLeft would be better, of course. I see the elegance of takeWhile, but not when you keep counting things that have been counted already. This is a complexity issue, that really should be fixed.Discernible
@daniel no arguments from me. I'll stick this back in the answer, with a bit of explanation how it worksMcdade
@neo As I wrote it, yes. I'm leaving all the hard work of making a parameterized version to Kevin. :-)Discernible
sorry, just removed my own comment, thought it was wrong, here again: "@Daniel: Wouldn't your alternate version force the use of List[Int] instead of List[T]?"Chamade
I'll stick with tail recursion to remove the object creation cost of that closure, just update it to use the same logic as Daniel's fold. This is clearly the bottleneck in the the algorithm. Ironically, it's also the only part that's highly serial in nature.Mcdade
Actually, I'll also get some specialization in there to cut down on boxing costs. Expect to see this on github soon, it's a nice case study, will be more readable there, and I haven't even started on any optimization beyond making it parallelizable.Mcdade
C
5

You can do it in O(n^2) relatively easy.
First, some-precalculations. For each cell in matrix, calculate how many consecutive cells below it have the same number.
For your example, resulting matrix a (couldn't think of better name :/) will look like this

0 0 0 0 0 2 2
1 1 2 2 2 1 1
0 0 1 1 1 0 0
1 1 0 0 0 1 1
0 0 0 0 0 0 0

It can be produced in O(n^2) easily.

Now, for each row i, let's find all rectangles with top side in row i (and bottom side in row i + height - 1).
Here's an illustration for i = 1

0 0 0 0 0 0 0
-------------
4 4 2 2 2 0 0
4 4 2 2 2 0 0
0 0 2 2 2 1 1
-------------
0 0 0 0 0 1 1

Now, the main idea

int current_width = 0;
for (int j = 0; j < matrix.width; ++j) {
    if (a[i][j] < height - 1) {
        // this column has different numbers in it, no game
        current_width = 0;
        continue;
    }

    if (current_width > 0) {
        // this column should consist of the same numbers as the one before
        if (matrix[i][j] != matrix[i][j - 1]) {
            current_width = 1; // start streak anew, from the current column
            continue;
        }
    }

    ++current_width;
    if (current_width >= width) {
        // we've found a rectangle!
    }
}

In the example above (i = 1) current_width after each iteration will be 0, 0, 1, 2, 3, 0, 0.

Now, we need to iterate over all possible i and we have a solution.

Coachman answered 11/1, 2011 at 11:36 Comment(29)
Argh... Java! Unclean, Unclean!Mcdade
This is close enough to what I was thinking that I probably wont bother.Dilantin
@Kevin It could as well be C or C++ or any of their family of languages :) Anyway, it should be easily convertable to any language, the only C-specific construct here is a 'for' loop. Just consider it pseudo-code.Coachman
@Nikita Honestly, there's a reason why people use these tags when asking questions! If nothing else, your use of the current_width mutable variable totally flies in the face of idiomatic Scala, and means that this solution can't be scaled to use the parallel features shortly due to be released with version 2.9. Pushing a serial and highly imperative algorithm like this on someone looking to grow their knowledge of a functional programming language is somewhat inappropriate.Mcdade
@Kevin I'm sorry if decent O(n^2) solution can't be implemented in Scala due it its 'features'. And I'm sorry that for decades in CS/algorithm works imperative pseudo-code is used almost exclusively. Hope you'll ever forgive me.Coachman
@Kevin As for 'growing', OP didn't specify in which area he wanted to 'grow': algorithms or Scala. But staying with brute-force O(n^4) solution surely doesn't look like growth to me.Coachman
@Nikita: My GPU has 256 cores... Moore's law tells us that the need for parallelization in software will double every 18 months. Individual cores are set to get slower for reasons of thermal/power efficiency. Big-O is inherently a measurement of serial complexity and doesn't account for this, nor does it recognise performant attributes of an algorithm such as cache coherency. An immutable, declarative solution can process every row, simultaneously, followed by every column (simultaneously) using something like this compiler plugin: code.google.com/p/scalacl/wiki/ScalaCLPluginMcdade
"Big-O is inherently a measurement of serial complexity" No, it's not. It's a measurement of the required number of operations. With a fixed number of cores, an O(n^4) algorithm is still O(n^4). And an immutable, declarative solution cannot every row simultaneously unless the number of cores is greater than or equal to n, unless I'm missing somethingGadgetry
@Kevin That reminds me of another argument I've heard long ago. "We don't need to learn counting: in future computers will do it for us!" :) And btw, this algorithm can be parallelized as well: processing of different rows can be done simultaneously. (In case of a computation - columns)Coachman
@Nikita, why not also produce a matrix of how many cells to the right have the same number? Then finding the tiles is just traversing the two matrices doing two comparisons for width and height (and that bit could be done functionally :-P)Gadgetry
@Paul Of course it could, you basically just summarised the algorithm in my answer :)Mcdade
@Paul I don't know about 'two comparisons'. You can check top and left borders of rectangle this way, but not insides (if I understand you correctly). Anyway, I do believe this can be rewritten as some kind of recursion in scala, only it would be ugly as hell :)Coachman
@Nikita sigh, no, I don't expect computers to count for us. But I do expect us to stop focusing on outdated methods of complexity measurement that don't allow for a certain amount of duplicated effort in order to massively improve the concurrency of an algorithm. The relevant exponential measurement here is core count vs time.Mcdade
@Kevin, Big-O is not outdated by concurrency. You're talking about the constant factor, not the Big-O behaviour, which remains relevant even if for some dimensions a worse big-O is outweighed by a better constant. That's as true for parallel solutions as it is for serial ones.Gadgetry
@Nikita the "two comparisons" that Paul referred two would be the row comparison, then a column comparison of (cell value, row comparison) tuples. I know this, because it's exactly the algorithm that I wrote. I use the same runLengths method for both rows and columns, this method being a non-ugly example of "some kind of recursion in Scala".Mcdade
@Kevin I wasn't talking about counting, but about predicting future. With average consumer laptop having 2-4 cores (the number which didn't seem to respect Moore's law recently), times when difference between 10^6 operations and 10^12 operations isn't important aren't coming soon. I can't believe I even have to argue about this. (Not to mention, there's absolutely nothing that prevents parallelization in Java or C, as I noted above.)Coachman
@Kevin It's nice that you explain your idea here, but you'd surely get more attention (and votes) if you did it in your post (preferably with some pseudo-code and without Scala-specific syntax). Scala isn't widely spoken here. Sad truth.Coachman
@Nikita When the Question is tagged both with "scala" and "scala 2.8", you want to believe that it's spoken around here... I suggest you go back and read my post wherein I do describe how it's working, how did you feel able to defame it as O(n^4) if you don't even know what it does?Mcdade
@Nikita incidentally, the average laptop has way more than 2 cores, you're forgetting about the GPU...Mcdade
@Kevin Now, that's funny. I noted that OP's code has O(n^4) complexity and you started proving that there's nothing wrong with O(n^4). I have no idea about your solution's complexity, although I did try to understand it.Coachman
@Kevin And how do you use GPU or videcard for algorithmic computations? I'm really interested.Coachman
@Kevin What exactly are we arguing about, anyway?Coachman
@Nikita I believe we're arguing over the relative importance of concurrent scalability vs minimising serial run-time for an algorithm... Also, here's the compiler plugin for running (some) scala code on the GPU: code.google.com/p/scalacl/wiki/ScalaCLPluginMcdade
@Nikita Thanks for your solution! I just benchmarked it and it outperforms mine by miles. Memory footprint is obviously higher due to the creation of matrix A but this is acceptable. I haven't implemented Kevin's approach yet, so this is what I'll do next.Chamade
@Nikita lol.... I just made a bigger benchmark, and when I use yours in my application (several hundred times) it takes 6-7secs in total, and my original solution takes 3065secs :D So yours is 400-500x faster... can this really be?? Of course these measurements are by no means scientific, but still gives an ideaChamade
@neo, 500 is less than 25^2. So a O(N^2) algorithm can be 500 times faster than an O(N^4) for N as small as 25, other things being equal. Those powers get big quickly. for N = 3000 (your bigger matrix) it could be 9 million times faster!Gadgetry
@Nikita By n^2, do you mean in a n x n matrix?Discernible
@Daniel Yes. You could say O(W*H), but it's not likely you'll have different restrictions for W and H.Coachman
@Nikita I don't know how I could've overseen it during all my testing etc. but there is a very little error in your code logic. The third current_width = 0; where it checks for same numbers has to be = 1 !!! Otherwise it always skips one "good" column (the check whether its height is big enough came earlier, so that's ok -- doesn't have to check again and can simply set it to 1)Chamade
D
5

I'll play the devil's advocate here. I'll show Nikita's exact algorithm written in a functional style. I'll also parallelize it, just to show it can be done.

First, computing consecutive cells with the same number below a cell. I made a slight change by returning all values plus one compared to Nikita's proposed output, to avoid a - 1 in some other part of the code.

def computeHeights(column: Array[Int]) = (
    column
    .reverse
    .sliding(2)
    .map(pair => pair(0) == pair(1))
    .foldLeft(List(1)) ( 
        (list, flag) => (if (flag) list.head + 1 else 1) :: list
    )
)

I'd rather use .view before reversing, but that doesn't work with the present Scala version. If it did, it would save repeated array creations, which ought to speed up the code a lot, for memory locality and bandwidth reasons if no other.

Now, all columns at the same time:

import scala.actors.Futures.future

def getGridHeights(grid: Array[Array[Int]]) = (
    grid
    .transpose
    .map(column => future(computeHeights(column)))
    .map(_())
    .toList
    .transpose
)

I'm not sure if the parallelization overhead will pay off here or not, but this is the first algorithm on Stack Overflow where it actually has a chance, given the non-trivial effort in computing a column. Here's another way of writing it, using the upcoming 2.9 features (it might work on Scala 2.8.1 -- not sure what :

def getGridHeights(grid: Array[Array[Int]]) = (
    grid
    .transpose
    .toParSeq
    .map(computeHeights)
    .toList
    .transpose
)

Now, the meat of Nikita's algorithm:

def computeWidths(height: Int, row: Array[Int], heightRow: List[Int]) = (
    row
    .sliding(2)
    .zip(heightRow.iterator)
    .toSeq
    .reverse
    .foldLeft(List(1)) { case (widths @ (currWidth :: _), (Array(prev, curr), currHeight)) =>
        (
            if (currHeight >= height && currWidth > 0 && prev == curr) currWidth + 1
            else 1
        ) :: widths
    }
    .toArray
)

I'm using pattern matching in this bit of code, though I'm concerned with its speed, because with all the sliding, and zipping and folding there's two many things being juggled here. And, speaking of performance, I'm using Array instead of IndexedSeq because Array is the only type in the JVM that is not erased, resulting in much better performance with Int. And, then, there's the .toSeq I'm not particular happy about either, because of memory locality and bandwidth.

Also, I'm doing this from right to left instead of Nikita's left to right, just so I can find the upper left corner.

However, is the same thing as the code in Nikita's answer, aside from the fact that I'm still adding one to current width compared to his code, and not printing the result right here. There's a bunch of things without clear origins here, though -- row, heightRow, height... Let's see this code in context -- and parallelized! -- to get the overall picture.

def getGridWidths(height: Int, grid: Array[Array[Int]]) = (
    grid
    .zip(getGridHeights(grid))
    .map { case (row, heightsRow) => future(computeWidths(height, row, heightsRow)) }
    .map(_())
)

And the 2.9 version:

def getGridWidths(height: Int, grid: Array[Array[Int]]) = (
    grid
    .toParSeq
    .zip(getGridHeights(grid))
    .map { case (row, heightsRow) => computeWidths(height, row, heightsRow) }
)

And, for the gran finale,

def findRectangles(height: Int, width: Int, grid: Array[Array[Int]]) = {
    val gridWidths = getGridWidths(height, grid)
    for {
        y <- gridWidths.indices
        x <- gridWidths(y).indices
        if gridWidths(y)(x) >= width
    } yield (x, y)
}

So... I have no doubt that the imperative version of Nikita's algorithm is faster -- it only uses Array, which is much faster with primitives than any other type, and it avoids massive creation of temporary collections -- though Scala could have done better here. Also, no closures -- as much as they help, they are slower than code without closures. At least until JVM grow something to help with them.

Also, Nikita's code can be easily parallelized with threads -- of all things! -- with little trouble.

But my point here is that Nikita's code is not particularly bad just because it uses arrays and a mutable variable here and there. The algorithm translates cleanly to a more functional style.

EDIT

So, I decided to take a shot at making an efficient functional version. It's not really fully functional because I use Iterator, which is mutable, but it's close enough. Unfortunately, it won't work on Scala 2.8.1, because it lacks scanLeft on Iterator.

There are two other unfortunate things here. First, I gave up on parallelization of grid heights, since I couldn't get around having at least one transpose, with all the collection copying that entails. There's still at least one copying of the data, though (see toArray call to understand where).

Also, since I'm working with Iterable I loose the ability to use the parallel collections. I do wonder if the code couldn't be made better by having grid be a parallel collection of parallel collections from the beginning.

I have no clue if this is more efficient than the previous version of not. It's an interesting question...

def getGridHeights(grid: Array[Array[Int]]) = (
    grid
    .sliding(2)
    .scanLeft(Array.fill(grid.head.size)(1)) { case (currHeightArray, Array(prevRow, nextRow)) =>
        (prevRow, nextRow, currHeightArray)
        .zipped
        .map { case (x, y, currHeight) =>  if (x == y) currHeight + 1 else 1 }
    }
)

def computeWidths(height: Int, row: Array[Int], heightRow: Array[Int]) = (
    row
    .sliding(2)
    .map { case Array(x, y) => x == y }
    .zip(heightRow.iterator)
    .scanLeft(1) { case (currWidth , (isConsecutive, currHeight)) =>
        if (currHeight >= height && currWidth > 0 && isConsecutive) currWidth + 1
        else 1
    }
    .toArray
)

import scala.actors.Futures.future

def getGridWidths(height: Int, grid: Array[Array[Int]]) = (
    grid
    .iterator
    .zip(getGridHeights(grid))
    .map { case (row, heightsRow) => future(computeWidths(height, row, heightsRow)) }
    .map(_())
    .toArray
)

def findRectangles(height: Int, width: Int, grid: Array[Array[Int]]) = {
    val gridWidths = getGridWidths(height, grid)
    for {
        y <- gridWidths.indices
        x <- gridWidths(y).indices
        if gridWidths(y)(x) >= width
    } yield (x - width + 1, y - height + 1)
}
Discernible answered 15/1, 2011 at 4:31 Comment(11)
Thanks! I always knew decent algorithms can be implemented in scala :) I'll look at it more when I have free time, maybe I'll even start understanding some scala.Coachman
@Daniel Wow, I didn't expect that! I find I very exciting to see all your ideas and now even with using the actor framework, until now I only read about it here and there.Chamade
@Daniel I just tried to test your code but the compiler doesn't like the last call to transpose in getGridHeights: "could not find implicit value for parameter asArray: (List[Int]) => Array[U]" and "not enough arguments for method transpose: (implicit asArray: (List[Int]) => Array[U])Array[Array[U]]. Unspecified value parameter asArray."Chamade
@neo Ok, I tested it. I played a while with this code, going back and forth with what collections to use. Originally, I was using Array throughout, on the assumption that there was going to be a lot of indexed access. When I finally finished the code, that turned out not to be the case, so I went back and tried to reduce collection transformations. I obviously forgot to test this particular bit of code (I was mostly using the 2.9 versioN), which is now fixed.Discernible
@neo By the way, I'm going mostly for clear code in functional style here, not speed. I was a bit thwarted because of two problems in Scala's library (one, arguably, a bug), which would also make the code faster. However, nothing will beat in speed while loops implementing Nikita's shown algorithm. You might gain from parallelization, but the same is possible in Nikita's code with very few changes.Discernible
@Daniel I also prefer clear code whenever possible, that's why I'm also interested in fast functional solutions like yours. I'll include benchmark results for your code in a sec.Chamade
well, I meant not clear code (because Nikita's code is clear as well), but functional style codeChamade
Try tail recursion for those compute methods, you should be able to squeeze out some more performance.Mcdade
@Kevin What I'd like to do would be use scanRight on Iterable. Unfortunately, no scanRight on Iterable, and scanRight isn't implemented correctly anyway.Discernible
@Daniel Have you tried using ScalaCL's compiler plugin ? It still has some rough edges, but it should be able to speed up many loops on arrays, including reduceLeft/scanLeft (no need for any OpenCL hardware, it's just loops rewriting). I suggest to use it's latest sbaz version (0.2.Beta7).Paver
@Paver No, I haven't. It was not my goal to achieve speed.Discernible

© 2022 - 2024 — McMap. All rights reserved.