After further thought, I've come up with a faster solution:
Create an immutable Range
struct with StartX
, StartY
, and EndY
properties.
Maintain a sparse array of current Range
s, with length equal to the height of the image, and a single nullable currentRange variable. This array will contain the range, if any, that starts at each Y coordinate
For each column,
- Clear
currentRange
For each cell in the column:
If there is no currentRange, or if there is but it ended before this cell (if currentRange.EndY <= y
), set currentRange to the y
'th element in the ranges array.
Thus, currentRange
will always refer to the range that contains the current cell, or null
if the current cell is not in a range.
If the current cell is 1
:
If we're in a range, do nothing – the range is continuing into the next column.
If we're not in a range, loop down the column until we hit a 0
or the end of a column, then create a new range stretching from the 1
we just found until the end of the loop.
Then, proceed to the next if (since the current cell is now 0
or the end of the column, and we're not in a range)
If you hit an existing range as you loop ahead to create the new range, you can either stop the new range and let the existing range continue below it (best for fuzzy edges), or close the existing range (see below) and let the new range stretch downwards to take over from the existing range.
- If the current cell is
0
:
- If we're in a range, close the range:
- Return a new rectangle stretching from the range's start X and Y to the current Y and the range's end X.
- Clear the range from the array of active ranges.
This algorithm is O(x * y)
in computation and O(y)
in space. I believe that this is the fastest solution to the problem.
You can also rotate this algorithm and scan rows rather than columns (so that ranges would be stretched downwards rather than rightwards), which will be O(x)
in storage.
Here is a C# implementation:
class BoxFinder {
class Range {
public Range(int startX, int startY, int endY) {
StartX = startX;
StartY = startY;
EndY = endY;
}
public int StartX { get; private set; }
public int StartY { get; private set; }
public int EndY { get; private set; }
}
public BoxFinder(int[,] data) {
Data = data;
Width = data.GetLength(0);
Height = data.GetLength(1);
ranges = new Range[Height];
}
public int[,] Data { get; private set; }
public int Width { get; private set; }
public int Height { get; private set; }
private Range[] ranges;
public IEnumerable<Rectangle> GetBoxes() {
for (int x = 0; x < Width; x++) {
Range currentRange = null;
for (int y = 0; y < Height; y++) {
Func<Range, Rectangle> CloseRange = r => {
currentRange = null;
ranges[r.StartY] = null;
return new Rectangle(
r.StartY,
r.StartX,
x - r.StartX,
r.EndY - r.StartY
);
};
if (currentRange == null || currentRange.EndY <= y)
currentRange = ranges[y];
if (Data[x, y] == 1) {
if (currentRange == null) {
int startY = y;
for (; y < Height && Data[x, y] == 1; y++) {
if (ranges[y] != null)
yield return CloseRange(ranges[y]);
//If there are fuzzy edges, break; instead
}
ranges[startY] = new Range(x, startY, y);
if (y == Height)
continue;
//Otherwise, continue to the next if in case a previous range just ended
}
}
//No else; we can get here after creating a range
if(Data[x, y] == 0) {
if (currentRange != null)
yield return CloseRange(currentRange);
}
}
}
}
}