How to use the kernel hashtable API?

I'm trying to understand and use the kernel hash tables and I've already read this and this link, but I didn't understand none of them. My first question is: Why our struct has to have the struct h_list inside it? If we're gonna access our struct through the struct h_list our struct shouldn't be within struct h_list and not the opposite? After reading the tutorial I've tried to write the following code:

DECLARE_HASHTABLE(nodes_hash, 3)

hash_init(nodes_hash);

struct h_node{
    int data;
    char name[MAX_NAME_SIZE]; /*The key is the name of lua state*/
    struct hlist_node node;
};

struct h_node a = {
    .data = 3,
    .name = "foo",
    .node = 0   
};

struct h_node b = {
    .data = 7,
    .name = "bar",
    .node = 0   
};    

hash_add(nodes_hash,&a.node, "foo");
hash_add(nodes_hash,&b.node, "bar");

But this does not even compile. What I'm doing wrong? I need that the key be the same name present in the struct h_node. So I would like that my hash table be like this:

PS: In my hash table it will never occur a collision (I'll handle it to never occur) so the key can be the name in the struct h_node

Why our struct has to have the struct h_list inside it? If we're gonna access our struct through the struct h_list our struct shouldn't be within struct h_list and not the opposite?

That's because of how hash tables are implemented in the Linux kernel. An hashtable is just an array of fixed size of struct hlist_head. Each of those represents a bucket, and is the head of a linked list. The hashtable only contains a bunch of linked lists of struct hlist_node, nothing else. It does not really "store" the entire user defined struct, it merely holds a pointer to the struct hlist_node field of each element.

When you add an element to the hashtable, a bucket is selected, and a pointer to the struct hlist_node field of your struct is inserted in the bucket list. When you later retrieve an element (for example through hash_for_each()), the container_of() macro is used to get your real structure back, knowing its type and the name of the struct member of type struct hlist_node inside your user defined structure.

This can be seen following the source code. For example, for hash_for_each() we have:

hash_for_each(name, bkt, obj, member) does:

 for ((bkt) = 0, obj = NULL; obj == NULL && (bkt) < HASH_SIZE(name);\
                 (bkt)++)\
         hlist_for_each_entry(obj, &name[bkt], member)

hlist_for_each_entry() does:

 for (pos = hlist_entry_safe((head)->first, typeof(*(pos)), member);\
      pos;                           \
      pos = hlist_entry_safe((pos)->member.next, typeof(*(pos)), member))

hlist_entry_safe() does:

 ({ typeof(ptr) ____ptr = (ptr); \
    ____ptr ? hlist_entry(____ptr, type, member) : NULL; \
 })

And finally hlist_entry() uses container_of() to get the real struct:

 #define hlist_entry(ptr, type, member) container_of(ptr,type,member)

I need that the key be the same name present in the struct h_node.

This is not possible natively. The Linux kernel hashtable API only deals with integer keys. If you take a look at the implementation in linux/hashtable.h, you can see the hash functions used are hash_32() and hash_64(), and they both take unsigned integer values (u32 and u64 respectively).

The Linux kernel hashtable API is very limited and it definitely does not implement the same kind of hashtable that you are used to in other programming languages. It cannot use strings as keys, and it has a fixed size.

If you want to use strings, you will have to hash those strings to generate an unsigned integer. To do so you can either use xxhash() or write your own function. The xxhash() function is relatively new and does not yet seem to be used in kernel code, so I think your kernel was most likely configured without it and you don't have it available.

In any case, beware that if the hash function transforms different strings into the same key, or if hash_add() ends up choosing the same index in the hashtable array to insert the elements, then the two elements will be placed in the same hashtable bucket. Therefore, when retrieving any element (using for example hash_for_each_possible()) you need to take this into consideration and correctly check its name.

Working example

Here's a complete working example to demonstrate basic usage of kernel hashtables, tested on kernel v4.9, but should work on the latest v5.7 too. Note that in this example I am allocating the variables on the stack of the module _init function just for simplicity. This means that I cannot do hash_for_each_possible() from anywhere else in the code except from inside that function. If you want a global hashtable capable of holding elements that are later accessed by different functions, you will need to allocate them dynamically using kmalloc().

// SPDX-License-Identifier: GPL-3.0
#include <linux/hashtable.h> // hashtable API
#include <linux/module.h>    // module_{init,exit}, MODULE_*
#include <linux/string.h>    // strcpy, strcmp
#include <linux/types.h>     // u32 etc.

#define MAX 32

struct h_node {
    int data;
    char name[MAX];
    struct hlist_node node;
};

DECLARE_HASHTABLE(tbl, 3);

// Just to demonstrate the behavior when two keys are equal.
static u32 myhash(const char *s) {
    u32 key = 0;
    char c;

    while ((c = *s++))
        key += c;

    return key;
}

static int __init myhashtable_init(void)
{
    struct h_node a, b, *cur;
    u32 key_a, key_b;
    unsigned bkt;

    pr_info("myhashtable: module loaded\n");

    a.data = 3;
    strcpy(a.name, "foo");

    b.data = 7;
    strcpy(b.name, "oof");

    /* Calculate key for each element.
     * Since the above hash function only sums the characters, we will
     * end up having two identical keys here.
     */
    key_a = myhash(a.name);
    key_b = myhash(b.name);

    pr_info("myhashtable: key_a = %u, key_b = %u\n", key_a, key_b);

    // Initialize the hashtable.
    hash_init(tbl);

    // Insert the elements.
    hash_add(tbl, &a.node, key_a);
    hash_add(tbl, &b.node, key_b);

    // List all elements in the table.
    hash_for_each(tbl, bkt, cur, node) {
        pr_info("myhashtable: element: data = %d, name = %s\n",
            cur->data, cur->name);
    }

    // Get the element with name = "foo".
    hash_for_each_possible(tbl, cur, node, key_a) {
        pr_info("myhashtable: match for key %u: data = %d, name = %s\n",
            key_a, cur->data, cur->name);

        // Check the name.
        if (!strcmp(cur->name, "foo")) {
            pr_info("myhashtable: element named \"foo\" found!\n");
            break;
        }
    }

    // Remove elements.
    hash_del(&a.node);
    hash_del(&b.node);

    return 0;
}

static void __exit myhashtable_exit(void)
{
    // Do nothing (needed to be able to unload the module).
    pr_info("myhashtable: module unloaded\n");
}


module_init(myhashtable_init);
module_exit(myhashtable_exit);
MODULE_VERSION("0.1");
MODULE_DESCRIPTION("Silly kernel hashtable API example module.");
MODULE_AUTHOR("Marco Bonelli");
MODULE_LICENSE("GPL");

dmesg output on my machine:

[ 3174.567029] myhashtable: key_a = 324, key_b = 324
[ 3174.567030] myhashtable: element: data = 7, name = oof
[ 3174.567031] myhashtable: element: data = 3, name = foo
[ 3174.567032] myhashtable: match for key 324: data = 7, name = oof
[ 3174.567033] myhashtable: match for key 324: data = 3, name = foo
[ 3174.567033] myhashtable: element named "foo" found!

Working example

Recommended topics

Hot tags