How to randomly select a key based on its Integer value in a Map with respect to the other values in O(n) time?

E

6

7

If we have a Map<T, Integer>, let's say the Integer value represents "how many" Ts there are. Thus, I want to uniformly select a T based on its Integer value. If the map contains Strings with "a"=4 and "b"=6, then I want it so that 40% of the time "a" is selected and 60% of the time "b" is selected.

Most importantly, I'd like this in O(n), n being two (not ten) in my previous example. I originally made an ArrayList containing the keys by how many values it had (and simply returning any random index), but this process is not only very slow, but completely counterintuitive for what the Map<T, Integer> represents.

Eddaeddana answered 6/3, 2011 at 17:25 Comment(1)

Well, its random with a non-uniform probability-distribution-function. – Dustindustman 6/3, 2011 at 17:38

E

-1

OP here.

I came up with an elegant solution! For any misunderstandings: My original idea of storing all the keys by how many values in an ArrayList was completely disregarding the point of using a Map to store "instances of the Key using Integers"; any similar solutions are counterproductive! Assuming the Map is unordered, here is my solution:

public T randomPick(Random r) {

        int randomValue = r.nextInt(size());
        int currentSum = 0;
        T lastElement = null;

        for (T t : map.keySet()){
            if (randomValue < currentSum + map.get(t)){
                return t;
            }
            currentSum+= map.get(t);
            lastElement = t;
        }
        return lastElement;
    }

It compares the random value with a current sum + the current element's value. If it is less than that, we return the current key. Else, keep going and add that value to the sum. If it is the case such that the random value is never less than any of the values, we return the lastElement.

Hope this clears it up.

Eddaeddana answered 8/3, 2011 at 2:28 Comment(2)

Thanks for the follow-up; I was curious what you ended up doing. Your method looks quite similar to @Voo's, so you might consider giving him/her an (additional) upvote. Cheers! – Rompish 8/3, 2011 at 3:16

One more thing: you can make your code a fair bit more efficient if you iterate over denseBagMap.entrySet() rather than .keySet(). See here for an example: https://mcmap.net/q/1477599/-java-big-o-performance/… . – Rompish 9/3, 2011 at 20:59

R

12

Sorry for the delay, but I think I have a relatively elegant solution with O(n lg n) construction time and O(lg n) fetch-a-random-element time. Here goes.

WeightedProbMap: This class implements the random element generator. It is constructed based on an Iterable; see Test.java below.

import java.util.Random;
import java.util.SortedMap;
import java.util.TreeMap;

class WeightedProbMap<EltType>  {
    private SortedMap<Integer, EltType> elts = new TreeMap<Integer, EltType>();
    private Random rand = new Random();
    private int sum = 0;

    // assume: each weight is > 0; there is at least one element;
    //         elements should not be repeated
    // ensure: this.elts maps cumulative weights to elements;
    //         this.sum is the total weight
    public WeightedProbMap(Iterable<Pair<Integer, EltType>> weights) {
        for (Pair<Integer, EltType> e : weights) {
            this.elts.put(this.sum, e.second);
            this.sum += e.first;
        }
    }

    // assume: this was initialized properly (cf. constructor req)
    // ensure: return an EltType with relative probability proportional
    //         to its associated weight
    public EltType nextElt() {
        int index = this.rand.nextInt(this.sum) + 1;
        SortedMap<Integer, EltType> view = this.elts.headMap(index);
        return view.get(view.lastKey());
    }
}

Pair.java: Just a simple Pair class.

class Pair<X, Y> {
    public Pair(X x, Y y) {
        first = x;
        second = y;
    }

    X first;
    Y second;
}

Test.java: This is a very simple test harness for the WeightedProbMap (WPM) class. We build an ArrayList of elements with associated weights, use that to construct a WPM, and then get 10,000 samples from the WPM to see if elements appear with the expected frequency.

import java.util.ArrayList;

class Test {
    public static void main(String argc[]) {
        ArrayList<Pair<Integer, String> > elts = new ArrayList<Pair<Integer, String>>();
        elts.add(new Pair<Integer, String>(20, "Hello"));
        // elts.add(new Pair<Integer, String>(70, "World"));
        // elts.add(new Pair<Integer, String>(10, "Ohai"));

        WeightedProbMap<String> wpm = new WeightedProbMap<String>(elts);

        for (int i = 0; i < 10000; ++i) {
            System.out.println(wpm.nextElt());
        }
    }
}

Testing this:

Uncomment one or both of the elts.add(...) lines in Test.java.
Compile with:

$ javac Pair.java WeightedProbMap.java Test.java
Run with (for example, in Unix):

$ java Test | grep "Hello" | wc -l

This will give you the count for that particular execution.

Explanation:

constructor: The WeightedProbMap (WPM) class uses a java.util.SortedMap to map cumulative weights to elements. A graphical explanation:

The constructor takes weights...     ...and creates a mapping from the
      3 +---+                            number line:
        |   | 
  2 +---+   +---+ 2                   0      2         5      7
    |   |   |   |                     +------+---------+------+
    |   |   |   |                     |   X  |    Y    |   Z  |
  --+---+---+---+--                   +------+---------+------+
      X   Y   Z

nextElt(): A SortedMap stores its data by key order, which allows it to cheaply provide 'views' of subsets of the map. In particular, the line

SortedMap<Integer, EltType> view = this.elts.headMap(index)

returns a view of the original map (this.elts) with only the keys that are strictly smaller than index. This operation (headMap) is constant time: view takes O(1) time to construct, and if you were to change this.elts later on, the changes would be reflected in view as well.

Once we create the view of everything less than a random number, we now just have to find the greatest key in that subset. We do that with SortedMap.lastKey(), which, for a TreeMap, should take \Theta(lg n) time.

Rompish answered 6/3, 2011 at 19:58 Comment(1)

As a side note: this approach can be extended in a few ways. If you want to increase the weight of a particular element, you can simply add additional entries to this.elts (and updating this.sum accordingly). Decreasing weights may be a bit harder -- you would have to update all elements that follow that entry in the map. If this is the case, you may have to do some fancy footwork to reduce the cost of weight updates. From the original question, however, it seems implied that updates are not frequent. – Rompish 6/3, 2011 at 21:39

D

2

To do this, you have to cache the relative frequency of each value T. This gives you your O(n) probability-distribution for the price of an O(n) insertion-cost (you have to update the relative frequency of every T upon every insertion).

Dustindustman answered 6/3, 2011 at 17:28 Comment(2)

I can't really imagine that looping through a Map is faster than looping through an arraylist. Maps are excellent when you search for a key, and not that good for looping. – Criterion 6/3, 2011 at 17:32

I am guessing he uses the Map to keep the cummulative-distribution-function consistent even when he changes the value of the Ts. – Dustindustman 6/3, 2011 at 17:37

J

2

If you can store the total sum, that's quite easily done:

Just store the pairs (T, int) as a class or whatever in an ordinary array and then go over it:

int val = Random.nextInt(total);
for (Pair p : pairs) {
    val -= p.val;
    if (val < 0) return p;
}

Can't get much faster considering that looping through an ArrayList is the most efficient way to iterate through n values and you can obviously not do better than O(n). The only overhead is the nextInt() and you need that (or something similar) as well in every solution. Depending on how you organize the ArrayList (sorted or not) other operations get cheaper/more expensive, but it's unimportant for that particular action

Edit: Although thinking about it the "you obviously need O(n)" isn't true. If you change the values in the array rarely and can allow a expensive preparation and memory isn't a problem you can do better by storing a HashMap. If you've got for example a distribution: T0: 2 T1: 3 T2: 1

You could insert (0, T0), (1, T0), (2, T1),.,(4, T1), (5, T2) in the hashmap.

Edit2: Or see phooji's approach which should be feasible for larger sets of data.

Jahnke answered 6/3, 2011 at 17:38 Comment(3)

"you can obviously not do better than O(n)" Please don't throw around math like that. Obvious improvement (probably not necessary, but faster than O(n)): store the pairs in order based on their cumulative weight; conduct a binary search that returns the closest element. Complexity: O(lg n). Cheers. – Rompish 6/3, 2011 at 17:45

@phooji: Yep true, was too fast there. Though of another solution as well that's done in O(1) and edited that before seeing your solution. – Jahnke 6/3, 2011 at 17:48

Your O(1) solution sounds an awful lot like the OP's approach, using a hasmap rather than an array. – Rompish 6/3, 2011 at 18:23

K

1

Build an inverse map, Map<Integer,T>so that every key is the sum of all the weights processed so far.

For example if you have this map:

T1 -> 10
T2 -> 8
T3 -> 3

This inverse map is:

10 -> T1
18 -> T2
21 -> T3

(For better performance, you can arrange your weights in descending order first.)

Then generate an evenly distributed random number between 0 and the sum of all weights, and perform a binary search for this number in the key set of the inverse map.

Kilometer answered 6/3, 2011 at 21:22 Comment(0)

H

0

Using an arraylist would actually be even faster than using a Map, because you can do it in O(1).

class RandVal<T> {

    List<T> list = new ArrayList<T>();
    Random rand = new Random();

    public T randomValue() {
        int next = rand.nextInt(list.size());
        return list.get(next);
    }

}

The only way this is a bad thing is if order matters (A A B B A B vs A B B A B A or something) but it's obvious it doesn't because you're using a Map which has no ordering...

Hamrick answered 6/3, 2011 at 17:46 Comment(6)

From the (brief) description in the question, it looks like this is exactly the OP's approach, and it apparently doesn't scale for them. – Rompish 6/3, 2011 at 20:55

You downvoted me based on a clarification that you edited in to the post? I'm not sure I follow the logic on that... Given the original post, this is the best solution (OP didn't say how he used array list, you merely inferred that.) In any event, I would hardly say this answer is 'not useful' given the context (which is the downvote criteria.) – Hamrick 6/3, 2011 at 23:33

@glowcoder: I'm sorry you disagree. I agree that the question edit was based on my interpretation, but I don't think the original post was unclear about the way in which the ArrayList was being used (stackoverflow.com/posts/5212089/revisions). Even so, I'd be happy to remove the downvote if you'd care to make your post a bit more self-contained, for example by mentioning how list is populated. – Rompish 7/3, 2011 at 1:4

@Rompish list is populated in exactly the way that you edited in that OP did not want. My post is properly self-contained. I feel you have abused the ability to edit posts by clarifying something for the OP that wasn't your place to clarify. The fact is we don't know for sure what he meant, and it is not our place to edit the question to fit our answers, or to downvote other answers. Now I really don't care about the DV - I care much more about your misuse of abilities and misunderstanding of SO mechanics. – Hamrick 7/3, 2011 at 1:11

@glowcoder: My edit was approved by a mod -- at my rep level, I cannot edit questions directly. – Rompish 7/3, 2011 at 1:42

@Rompish anyone with a certain rep level can approve edits. unfortunately a lot of people don't actually read the edits, they just go ahead and approve them. In fact, a mod actually reverted your edits. – Hamrick 7/3, 2011 at 2:51

E

-1