near perfect or perfect hash of memory addresses in c
Asked Answered
F

1

6

I have a list of memory addresses from 0xc0003000 to 0xc04a0144 there are many gaps and < 4096 entries in the list. It is known at compile time and I want to make a perfect hash for it.

However looking up perfect hashing online gives me information mostly related to hashing strings and they don't seem to translate well.

To be clear I want to be able to obtain the memory address at run time and check that it is in the hash quickly. Currently I am using a binary search which is on average about 8 loops to find the answer.

Any ideas what tree I should bark up?

Foetus answered 26/7, 2012 at 20:6 Comment(10)
How about balanced trees, like B-tree or red-black ?Lebaron
I think the radix tree is the best search tree for sparse integer value search.Faustino
Can't you just use the address itself as a 4-byte string? gperf has support for working on non-NUL-terminated strings: gnu.org/software/gperf/manual/gperf.html#Binary-StringsPerez
I have a list of integer values and I want to check if a run time generated value is in that list. I will look at a radix tree, bitset is c++ this is strait c. I believe a balanced tree is the same as a binary search for performance, which is what i am hoping to beat.Foetus
gperf sounds like exactly what you need.Cutlass
Radix tree works good with memory pages (all addresses within the page have the same address prefix).Faustino
I am trying to use gperf in a binary manner but I don't see how that is supposed to be done, any ideas? the -l flag doesnt seem to do what I expectFoetus
I've done a little test run of the gperf -l approach. I got it to work, but I'm not sure it'll be what you want. It does extra comparisons because it expects to work with strings of various lengths. There doesn't seem to be a way to tell it that the string will always be exactly 4 bytes. It might still be more efficient than some of the alternatives though. Should I post my code as an answer?Perez
yeah please do, I got some okish results from -l but the problem I was running into was that my binary data occasionally contains 0x0a which is a newline and it sees that as a delimiter. I tried changing the delimiter with the -e flag to no avail.Foetus
As a side note I am going to give a radix tree a shot, i will benchmark and see how the performance compares to the binary search.Foetus
P
3

Here's a sample gperf program. I included a NUL and a newline in the sample data to prove that they don't cause it to fail.

%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <arpa/inet.h>
%}
%%
"\xc0\x01\x02\x03"
"\xc0\xff\xff\xff"
"\xc0\xff\x00\xff"
"\xc0\x0a\xff\xff"
%%
int main(int argc, const char **argv)
{
    int i;

    for(i=1;i<argc;++i) {
        uint32_t addr = ntohl(strtoul(argv[i], 0, 16));
        if(in_word_set((char *)&addr, 4))
            printf("0x%08"PRIx32" is in the list.\n", htonl(addr));
        else
            printf("0x%08"PRIx32" is not in the list.\n", htonl(addr));
    }
    return 0;
}

Save as addrs.gperf, compile and test with

gperf -l addrs.gperf > addrs.c
gcc addrs.c -o addrs
./addrs c0000000 c0010203 c0ffffff c00affff c0ff0aff c0ffff00 c0ff00ff
Perez answered 27/7, 2012 at 20:31 Comment(2)
It would look a lot cleaner, and run a little bit faster, if gperf was actually designed to be used for this purpose.Perez
This works well for what I was doing and is about 40% faster than the binary search (10,000,000 loops). The radix tree ended up being about equal to the binary search, it was marginally better.Foetus

© 2022 - 2024 — McMap. All rights reserved.