Perfect hash function for strings known in advance
Asked Answered
B

4

5

I have 4000 strings and I want to create a perfect hash table with these strings. The strings are known in advance, so my first idea was to use a series of if statements:

 if (name=="aaa")
      return 1;
 else if (name=="bbb")
      return 2;
        .
        .
        .
 // 4000th `if' statement

However, this would be very inefficient. Is there a better way?

Blob answered 29/12, 2014 at 18:36 Comment(3)
If you know all the strings a priori, are you sure you need to hash them?Flynt
I need to store a graph as a adjacency list so I guess hash table is required.@FlyntBlob
Hashing is the most efficient way to do this for that many strings, if you have a priori information about your strings you can create a good hashing function by using something like gperf.Diazine
F
13

gperf is a tool that does exactly that:

GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.

According to the documentation, gperf is used to generate the reserved keyword recogniser for lexers in GNU C, GNU C++, GNU Java, GNU Pascal, GNU Modula 3, and GNU indent.

The way it works is described in GPERF: A Perfect Hash Function Generator by Douglas C. Schmidt.

Flynt answered 29/12, 2014 at 18:39 Comment(3)
I think it is too much for my application, I prefer simpler thingBlob
I'll bookmark this thread to see your perfect solution that is simpler than gperf.Scalage
A runtime solution may be easier to build. This library is apparently more efficient memory-wise: cmph.sourceforge.netRemodel
P
10

Better later than never, I believe this now finally answers the OP question:

Simply use https://github.com/serge-sans-paille/frozen -- a Compile-time (constexpr) library of immutable containers for C++ (using "perfect hash" under the hood).

On my tests, it performed in pair with the famous GNU's gperf perfect hash C code generator.

On your pseudo-code terms:

#include <frozen/unordered_map.h>
#include <frozen/string.h>

constexpr frozen::unordered_map<frozen::string, int, 2> olaf = {
    {"aaa", 1},
    {"bbb", 2},
    .
    .
    .
    // 4000th element
};

return olaf.at(name);

Will respond in O(1) time rather than OP's O(n) -- O(n) assuming the compiler wouldn't optimize your if chain, which it might do)

Preventive answered 3/2, 2021 at 22:14 Comment(0)
P
7

Since the question is still unanswered and I'm about to add the same functionality to my HFT platform, I'll share my inventory for Perfect Hash Algorithms in C++. It is harder than I thought to find an open, flexible and bug free implementation, so I'm sharing the ones I didn't drop yet:

Preventive answered 11/12, 2018 at 23:58 Comment(0)
T
3

I believe @NPE's answer is very reasonable, and I doubt it is too much for your application as you seem to imply.

Consider the following example: suppose you have your "engine" logic (that is: your application's functionality) contained in a file called engine.hpp:

// this is engine.hpp
#pragma once
#include <iostream>
void standalone() {
  std::cout << "called standalone" << std::endl;
}
struct Foo {
  static void first() {
    std::cout << "called Foo::first()" << std::endl;
  }
  static void second() {
    std::cout << "called Foo::second()" << std::endl;
  }  
};
// other functions...

and suppose you want to dispatch the different functions based on the map:

"standalone" dispatches void standalone()
"first" dispatches Foo::first()
"second" dispatches Foo::second()
# other dispatch rules...

You can do that using the following gperf input file (I called it "lookups.gperf"):

%{

#include "engine.hpp"

struct CommandMap {
    const char *name;
    void (*dispatch) (void);
};

%}

%ignore-case
%language=C++
%define class-name Commands
%define lookup-function-name Lookup
struct CommandMap

%%
standalone, standalone
first, Foo::first
second, Foo::second

Then you can use gperf to create a lookups.hpp file using a simple command:

 gperf -tCG lookups.gperf > lookups.hpp

Once I have that in place, the following main subroutine will dispatch commands based on what I type:

#include <iostream>
#include "engine.hpp" // this is my application engine
#include "lookups.hpp" // this is gperf's output

int main() {

  std::string command;

  while(std::cin >> command) {
    auto match = Commands::Lookup(command.c_str(), command.size());
    if(match) {
      match->dispatch();
    } else {
      std::cerr << "invalid command" << std::endl;
    }
  }
}

Compile it:

 g++ main.cpp -std=c++11

and run it:

$ ./a.out
standalone
called standalone
first
called Foo::first()
Second
called Foo::second()
SECOND
called Foo::second()
first
called Foo::first()
frst
invalid command

Notice that once you have generated lookups.hpp your application has no dependency whatsoever in gperf.

Disclaimer: I took inspiration for this example from this site.

Tantalite answered 29/12, 2014 at 19:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.