Perfect minimal hash for mathematical combinations
Asked Answered
L

2

4

First, define two integers N and K, where N >= K, both known at compile time. For example: N = 8 and K = 3.

Next, define a set of integers [0, N) (or [1, N] if that makes the answer simpler) and call it S. For example: {0, 1, 2, 3, 4, 5, 6, 7}

The number of subsets of S with K elements is given by the formula C(N, K). Example

My problem is this: Create a perfect minimal hash for those subsets. The size of the example hash table will be C(8, 3) or 56.

I don't care about ordering, only that there be 56 entries in the hash table, and that I can determine the hash quickly from a set of K integers. I also don't care about reversibility.

Example hash: hash({5, 2, 3}) = 42. (The number 42 isn't important, at least not here)

Is there a generic algorithm for this that will work with any values of N and K? I wasn't able to find one by searching Google, or my own naive efforts.

Leveloff answered 4/1, 2013 at 16:27 Comment(2)
Just to be clear; what are the input and output of your hash function?Consonantal
@OliCharlesworth Edited. Input is one of the possible combinations, output is an integer in [0, C(N, K)).Leveloff
L
3

There is an algorithm to code and decode a combination into its number in the lexicographical order of all combinations with a given fixed K. The algorithm is linear to N for both code and decode of the combination. What language are you interested in?

EDIT: here is example code in c++(it founds the lexicographical number of a combination in the sequence of all combinations of n elements as opposed to the ones with k elements but is really good starting point):

typedef long long ll;

// Returns the number in the lexicographical order of all combinations of n numbers
// of the provided combination. 
ll code(vector<int> a,int n)
{
    sort(a.begin(),a.end());
    int cur = 0;
    int m = a.size();

    ll res =0;
    for(int i=0;i<a.size();i++)
    {
        if(a[i] == cur+1)
        {
            res++;
            cur = a[i];
            continue;
        }
        else
        {
            res++;
            int number_of_greater_nums = n - a[i];
            for(int j = a[i]-1,increment=1;j>cur;j--,increment++)
                res += 1LL << (number_of_greater_nums+increment);
            cur = a[i];
        }
    }
    return res;
}
// Takes the lexicographical code of a combination of n numbers and returns the 
// combination
vector<int> decode(ll kod, int n)
{
    vector<int> res;
    int cur = 0;

    int left = n; // Out of how many numbers are we left to choose.
    while(kod)
    {
        ll all = 1LL << left;// how many are the total combinations
        for(int i=n;i>=0;i--)
        {
            if(all - (1LL << (n-i+1)) +1 <= kod)
            {
                res.push_back(i);
                left = n-i;
                kod -= all - (1LL << (n-i+1)) +1;
                break;
            }
        }
    }
    return res;
}

I am sorry I have an algorithm for the problem you are asking for right now, but I believe it will be a good exercise to try to understand what I do above. Truth is this is one of the algorithms I teach in the course "Design and analysis of algorithms" and that is why I had it pre-written.

Landtag answered 4/1, 2013 at 16:30 Comment(10)
Is there no constant time algorithm? I'd really prefer that.Leveloff
@KendallFrey You can not have a constant time algorithm - your input is of size N (for the problem I solve above)! You can not perform better than the algorithm above. I guess you can write a O(K) solution for your problem by slightly modifying the solution above and again you can't perform better as this is the input size.Landtag
Sorry, by constant time I meant O(K). K will be a compile-time constant.Leveloff
@KendallFrey I believe you can get as low as O(K) assuming you have already calculated the powers of two up to N. Still I think that as K will probably be in the order of N there will not be a huge difference(Number of combinations grows really fast so both N and K will be quite small).Landtag
@KendallFrey is it possible that you got the complexity for my solution wrong? My solution is linear to the number of numbers in S namely N, not to the number of possible combinations.Landtag
I understand the O(N) complexity, but it seems excessive to me because my specific implementation will have N quite large (< 100 though).Leveloff
Why would you cache powers of two though? A bitshift is one of the fastest things you can doDecker
@harold true. I can't seem to remember why I did it and it makes no sense to me now :). I will fix thatLandtag
@izomorphius sort(a.begin(),a.end()) is not linearStoryteller
it is if your values are in small range. Seems there is optimization to use count sort in such cases. Verified on gcc 4.3 looong agoLandtag
C
1

This is what you (and I) need:

hash() maps k-tuples from [1..n] onto the set 1..C(n,k)\subset N. The effort is k subtractions (and O(k) is a lower bound anyway, see Strandjev's remark above):

// bino[n][k] is (n "over" k) = C(n,k) = {n \choose k}
// these are assumed to be precomputed globals

int hash(V a,int n, int k) {// V is assumed to be ordered, a_k<...<a_1
  // hash(a_k,..,a_2,a_1) = (n k) - sum_(i=1)^k (n-a_i   i) 
  // ii is "inverse i", runs from left to right

  int res = bino[n][k];
  int i;

  for(unsigned int ii = 0; ii < a.size(); ++ii) {
    i = a.size() - ii;   
    res = res - bino[n-a[ii]][i];
  }
  return res;
}
Connaught answered 7/12, 2013 at 14:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.