What data structure using O(n) storage with O(log n) query time should I use for Range Minimum Queries?
Asked Answered
C

6

22

I'm stumped by the following homework question for an algorithms class:

Suppose that we are given a sequence of n values x1, x2 ... xn, and seek to quickly answer repeated queries of the form: given i and j, find the smallest value in xi... xj

Design a data structure that uses O(n) space and answers queries in O(log n) time.

Firstly, I'm uncertain whether a sequence refers to a sorted set, or an unsorted set - but since it doesn't say otherwise I'll assume sequence means unsorted.

So, I realize this obviously must involve a binary tree, if we're talking about O(log N) lookup time. So basically, I figure, you have a set S, and you insert each element in S into a binary tree. The problem is, the question basically wants me to come up with a way to answer queries where I'm given a range of indices into an unsorted set - and then determine the lowest value in that range in O(log N) time. How is that possible? Even if each number of the set is inserted into the tree, the best I can do is lookup any particular number in O(log N) time. This doesn't allow me to find the lowest value in an unsorted range of numbers in S.

Any suggestions?

Coronograph answered 23/2, 2010 at 0:41 Comment(3)
+1 for disclosing that you're working on homework and for making an attempt before asking. Welcome to SO!Nichol
+1: excellent homework questionRedstart
That's from Skiena's book "The Algorithm Design Manual".Cornstarch
J
8

OK, I think I have a good start for you, and I learned something new in the process.

I would take a look at the Wikipedia entry on Cartesian trees. I'm not going to tell you more than that for fear of doing your homework for you, but you seem like a smart guy so I figure you can work it out.

Thanks for helping me learn a new data structure, by the way!

Jollity answered 23/2, 2010 at 1:3 Comment(3)
Unfortunately, I don't think a Cartesian tree will give you an O(log N) search time -- at least not in the worst case, because they are not balanced.Bacteriophage
@comingstorm: You're right. However if we assume the input set is random it'll average out to O(log N). As long as the OP justifies his understanding of that in his homework I'm sure he'll get full credit.Jollity
I've evaluated algorithms exam answers as a TA. I would give partial credit. It doesn't solve the stated problem---but it does display understanding by solving a(n only) slightly easier problem. I would give less credit if there's no explanation: then it just looks like an oversight.Rochellrochella
J
13

If the set were sorted, you would not need a tree. The smallest element in the range [i,j] would have index i.

So suppose the elements of your sequence were stored in order of their indices at the leaves of a tree. Can you store any additional information (ahem, perhaps some kind of min and max) at each interior node to facilitate your query?

If so, then if the tree is balanced, and if you can answer your query by looking only at the two paths from the root to the two elements at {i,j}, then you will achieve your O(log N) lookup cost. Since a balanced binary tree with N leaves contains (2N-1) total nodes, you will also satisfy your O(N) storage limit.


MORE DETAIL: Consider computing the minimum value in the range [i,j].

At each interior node A of the tree, keep the minimum value for all the leaves beneath it. This can be computed bottom up when the tree is first built.

Now start at leaf i. Walk up the tree, keeping as your candidate minimum value the value at i or anything known to be both right of i and left of j. Stop one node below the mutual ancestor of i and j.

Start again at leaf j. Walk up the tree, once again keeping as your candidate minimum the value at j or anything known to be both left of j and right of i.

Your minimum for [i,j] is then the minimum of the two values you've computed. Computing the maximum is analogous. Total storage requirement is 2 value per interior node plus two pointers per interior node plus one value per leaf, which is N + 4(N-1) for a full tree.

The path you travel up the tree from leaf i, is the same path you would travel down the tree if you were searching for leaf i.


C# CODE FOR SEARCH:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace RangeSearch
{
    public class RangeSearch
    {
        int[] tree;
        int N;

        int LeafLocation(int leafNumber) { return leafNumber + N - 1; }
        int LeafValue(int leafNumber) { return tree[ LeafLocation(leafNumber)]; }
        int LeftChild(int x) { return 2*x + 1; }
        int RightChild(int x) { return 2*x + 2; }
        int Parent(int x) { return (x-1)/2; }
        bool IsPowerOf2(int x) { while (x > 0) { if (x == 1) return true; if ((x & 1) == 1 ) return false; x = x >> 1; } return false; }
        bool IsAncestorOf( int x, int y ) { if( x>y ) return false;  return x==y || IsAncestorOf(LeftChild(x), y) || IsAncestorOf(RightChild(x),y); } // note: violating time bound for legibility, can fix by storing min/max descendant index at each node

        public RangeSearch(params int[] vals)
        {
            if (!IsPowerOf2(vals.Length))
                throw new ArgumentException("this implementation restricted to N being power of 2");
            N = vals.Length;
            tree = new int[2 * N - 1];
            // the right half of the array contains the leaves
            vals.CopyTo(tree, N - 1);
            // the left half of the array contains the interior nodes, each of which holds the minimum of all its children
            for (int i = N - 2; i >= 0; i--)
                tree[i] = Math.Min(tree[LeftChild(i)], tree[RightChild(i)]);           
        }

        public int FindMin(int a, int b)
        {
            if( a>b )
                throw new ArgumentException( "FindMin expects a range [a,b] with a<=b" );
            int x = Walk( a, true, b);
            int y = Walk( b, false, a);
            return Math.Min(x, y);
        }

        int Walk( int leafNumber, bool leftSide, int otherLeafNumber )
        {
            int minSoFar =  LeafValue(leafNumber);
            int leafLocation = LeafLocation(leafNumber);
            int otherLeafLocation = LeafLocation(otherLeafNumber);
            int parent = Parent(leafLocation);
            bool cameFromLeft = (leafLocation == LeftChild(parent));
            return Walk2(minSoFar, parent, cameFromLeft, leftSide, otherLeafLocation);
        }
        int Walk2(int minSoFar, int node, bool cameFromLeft, bool leftSide, int otherLeafLocation)
        {
            if (IsAncestorOf(node, otherLeafLocation))
                return minSoFar;
            if (leftSide)
                minSoFar = !cameFromLeft ? minSoFar : Math.Min(minSoFar, tree[RightChild(node)]);
            else
                minSoFar = cameFromLeft ? minSoFar : Math.Min(minSoFar, tree[LeftChild(node)]);
            return Walk2(minSoFar, Parent(node), node == LeftChild(Parent(node)), leftSide, otherLeafLocation);
        }
    }
}

C# CODE TO TEST IT:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace RangeSearch
{
    class Program
    {
        static void Main(string[] args)
        {
            RangeSearch rngA = new RangeSearch(9, 3, 7, 1);
            System.Diagnostics.Trace.Assert(3 == rngA.FindMin(0, 2) );
            System.Diagnostics.Trace.Assert(1 == rngA.FindMin(0, 3));
            System.Diagnostics.Trace.Assert(1 == rngA.FindMin(1, 3));

            RangeSearch rngB = new RangeSearch(1, 7, 3, 9);
            System.Diagnostics.Trace.Assert(3 == rngB.FindMin(1, 3));
            System.Diagnostics.Trace.Assert(1 == rngB.FindMin(0, 3));
            System.Diagnostics.Trace.Assert(1 == rngB.FindMin(0, 2));

            RangeSearch rngC = new RangeSearch(17, 21, 77, 70, 58, 79, 79, 89);
            System.Diagnostics.Trace.Assert(21 == rngC.FindMin(1, 7));

            RangeSearch rngD = new RangeSearch(94, 78, 88, 72, 95, 97, 89, 83);
            System.Diagnostics.Trace.Assert(72 == rngD.FindMin(1, 6));

            RangeSearch rngE = new RangeSearch(0, 66, 6, 43, 34, 34, 63, 49);
            System.Diagnostics.Trace.Assert(34 == rngE.FindMin(3, 4));

            Random rnd = new Random();
            for (int i = 0; i < 1000000; i++)
            {
                int[] tmp = new int[64];
                for (int j = 0; j < tmp.Length; j++)
                    tmp[j] = rnd.Next(0, 100);
                int a = rnd.Next(0, tmp.Length);
                int b = rnd.Next(a, tmp.Length);
                RangeSearch rng = new RangeSearch(tmp);
                System.Diagnostics.Trace.Assert(Min(tmp, a, b) == rng.FindMin(a, b));
            }
        }

        static int Min(int[] ar, int a, int b)
        {
            int x = ar[a];
            for (int i = a + 1; i <= b; i++)
                x = Math.Min(x, ar[i]);
            return x;
        }

    }
}
Jock answered 23/2, 2010 at 0:58 Comment(13)
Doesn't that use O(N log N) storage?Gyrostatics
Does that violate the O(n) storage requirement? You'd need O(3n) storage -> O(n) storage right?Carlicarlick
This would not violate the O(N) storage requirement. A balanced binary tree with N elements contains a total of 2N-1 nodes. If each node has a fixed amount of data, the total storage is O(N).Jock
This answer rubs me the wrong way. Can you provide some pseudocode that more aptly demonstrates how you get the minimum value of a range in O(log N) time? I see your approach is very similar to the Cartesian tree method but I think you're overestimating the efficiency of lookup.Jollity
Now that I think about it, what you describe is a Segment Tree .. with O(N log N) storage. See: topcoder.com/…Kutenai
@hobodave, Welbog: No. This is O(n) storage and queries can be answered in O(log n) time. Hint: Think knockout tournament.Hectograph
@Welbog: I added additional detail.Jock
@Eric: Thanks for that info. I followed your algorithm (as I understand it) on the following set of data: 9 3 7 1, and tried to compute the minimum in the subset [9 3 7] and the answer I get is 1, which isn't in that set. But even so your statement "anything known to be both right of i and left of j" is vexing. How can you know something isn't from that set without traversing that set (which is an O(n) operation)?Jollity
@Jollity - start at 7. Val=7. Walk up to P(7,1). You turned right, so keep Val=7. Now stop because the next node is an ancestor of both 7 and 9. Now start at 9. Val=9. Walk up to P(9,3). You turned right, so Val=Min(9,3)=3. Now stop, because the next node is an ancestor of both 7 and 9. Your answer is Min(7,3)=3, as desired. If you got 1, it's probably because you didn't swap the "If I turned left" direction when on the right side of the tree; I'll clarify that line.Jock
@Eric: I think you and I have reverse definitions of "left" and "right", but even if it works for the set 9 3 7 1, it'll break down if I simply reverse it to 1 7 3 9 because now your idea of left and right doesn't account for that. At least it doesn't when I run your algorithm on paper.Jollity
@Jollity -- I have added code now. The logic is slightly more complicated than I had sketched, but the basic idea is the same.Jock
@Eric: I'm convinced. +1 for your effort. Took me a while to see how it works.Jollity
I had a similar idea, but to go from top to bottom: use the stored minimum value from a node with all leafs within the interval i-j, check further each branch with leafs contain i or j. But I think your algorithm is more clear and probably uses less steps, thanks!Cornstarch
J
8

OK, I think I have a good start for you, and I learned something new in the process.

I would take a look at the Wikipedia entry on Cartesian trees. I'm not going to tell you more than that for fear of doing your homework for you, but you seem like a smart guy so I figure you can work it out.

Thanks for helping me learn a new data structure, by the way!

Jollity answered 23/2, 2010 at 1:3 Comment(3)
Unfortunately, I don't think a Cartesian tree will give you an O(log N) search time -- at least not in the worst case, because they are not balanced.Bacteriophage
@comingstorm: You're right. However if we assume the input set is random it'll average out to O(log N). As long as the OP justifies his understanding of that in his homework I'm sure he'll get full credit.Jollity
I've evaluated algorithms exam answers as a TA. I would give partial credit. It doesn't solve the stated problem---but it does display understanding by solving a(n only) slightly easier problem. I would give less credit if there's no explanation: then it just looks like an oversight.Rochellrochella
K
1

The definition of sequence is an ordered set (not sorted).

Knowing that the set is ordered allows you to use a Cartesian Tree which is an ideal data structure for Range minimum queries.

Kutenai answered 23/2, 2010 at 0:49 Comment(6)
See that's what I thought initially. But then the question makes no sense, because if the set is sorted, then finding the lowest element in any subset of the sequence is an O(1) operation. Given indices i and j, the lowest element would always be at index i.Coronograph
Sorted and ordered aren't the same thing, @Channel72.Jollity
Ditto to Welbog: sorted and ordered aren't the same thing. Ordered just means that items occur in a fixed order - not that they are sorted. Example of ordered: {3, 1, 7, 2} is obviously not sorted. I would take the homework description to mean you are given an input of an array of unsorted numbers.Rattlebox
Why do we need those fancy algo such as Sparse Table to solve rmq problem? Why don't we just sort the set and then the query will be served in constant time, right?Catherincatherina
@Alcott: if the input array is [9 1 7 3] and I query on (1, 2)---which represents the subsequence [1 7]---then your answer should be 1. If you store the array [1 3 7 9] and I query (1, 2), how do you arrive at 1?Rochellrochella
I guess this link might be useful to explain the difference between an ordered collection and a sorted collection, #1084646Decipher
O
1

You can use 'Segment Tree'. In segment tress both update and query time is O(logn). Here are the links if you want to understand how it works.

  1. https://www.geeksforgeeks.org/segment-tree-set-1-sum-of-given-range/
  2. https://www.youtube.com/watch?v=ZBHKZF5w4YU
Obtect answered 28/7, 2018 at 3:8 Comment(0)
C
0

Have you considered an Interval Tree?

Looking at the wikipedia entry, it seems to closely match what you are asking for. http://en.wikipedia.org/wiki/Interval_tree

EDIT:

Yes, it appears that interval trees are unsuitable for this scenario...

Crista answered 23/2, 2010 at 0:51 Comment(2)
No, looking over the description of that structure it doesn't fit this situation at all. That structure stores intervals in a sorted nature, and the data here is unsorted, but rather ordered. Intervals don't make sense in this situation. Unless you have a fancy way to create intervals in mind?Jollity
still, the answer doesn't deserve a -3. +1Redstart
W
0

Some prodding:

Suppose you stored, somehow, the minimum of all well-aligned* subarrays of length 1, 2, 4, 8, ...? How few of these minima can you get away with looking at to return the correct answer? How can you store them so that you can retrieve them efficiently?

(* e.g. store min(x0...3) and min(x4...x7), but not min(x1...x4))

Watermelon answered 5/7, 2013 at 19:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.