How to use the kernel hashtable API?
Asked Answered
A

1

10

I'm trying to understand and use the kernel hash tables and I've already read this and this link, but I didn't understand none of them. My first question is: Why our struct has to have the struct h_list inside it? If we're gonna access our struct through the struct h_list our struct shouldn't be within struct h_list and not the opposite? After reading the tutorial I've tried to write the following code:

DECLARE_HASHTABLE(nodes_hash, 3)

hash_init(nodes_hash);

struct h_node{
    int data;
    char name[MAX_NAME_SIZE]; /*The key is the name of lua state*/
    struct hlist_node node;
};

struct h_node a = {
    .data = 3,
    .name = "foo",
    .node = 0   
};

struct h_node b = {
    .data = 7,
    .name = "bar",
    .node = 0   
};    

hash_add(nodes_hash,&a.node, "foo");
hash_add(nodes_hash,&b.node, "bar");

But this does not even compile. What I'm doing wrong? I need that the key be the same name present in the struct h_node. So I would like that my hash table be like this:

enter image description here

PS: In my hash table it will never occur a collision (I'll handle it to never occur) so the key can be the name in the struct h_node

Accomplice answered 26/3, 2020 at 15:34 Comment(4)
Got it, but I still need to understand how to insert the elements in the hash table and how it works.Accomplice
Just FYI when you say "I've already read this and this link" the two links that you provide point to the same URL. In any case, you should also provide the compilation errors that occur when compiling your code.Botha
The key needs to be an integer type, but you are using a string literal. Casting the address of the string literal to an unsigned long would not be a good idea, because identical string literals do not necessarily have the same address (especially if they in different .c files) so the search key won't match the insertion key. You could use xxhash to hash the string contents to an unsigned long value and use that value as the key. Identical byte sequences will xxhash to the same value (as long as they are hashed with the same seed parameter), so it is stable. It needs CONFIG_XXHASH.Openmouthed
(If you cannot rely on CONFIG_XXHASH being configured, you could use your own code to hash the string contents using an FNV (Fowler/Noll/Vo) algorithm.)Openmouthed
B
20

Why our struct has to have the struct h_list inside it? If we're gonna access our struct through the struct h_list our struct shouldn't be within struct h_list and not the opposite?

That's because of how hash tables are implemented in the Linux kernel. An hashtable is just an array of fixed size of struct hlist_head. Each of those represents a bucket, and is the head of a linked list. The hashtable only contains a bunch of linked lists of struct hlist_node, nothing else. It does not really "store" the entire user defined struct, it merely holds a pointer to the struct hlist_node field of each element.

When you add an element to the hashtable, a bucket is selected, and a pointer to the struct hlist_node field of your struct is inserted in the bucket list. When you later retrieve an element (for example through hash_for_each()), the container_of() macro is used to get your real structure back, knowing its type and the name of the struct member of type struct hlist_node inside your user defined structure.

This can be seen following the source code. For example, for hash_for_each() we have:

  1. hash_for_each(name, bkt, obj, member) does:

     for ((bkt) = 0, obj = NULL; obj == NULL && (bkt) < HASH_SIZE(name);\
                     (bkt)++)\
             hlist_for_each_entry(obj, &name[bkt], member)
    
  2. hlist_for_each_entry() does:

     for (pos = hlist_entry_safe((head)->first, typeof(*(pos)), member);\
          pos;                           \
          pos = hlist_entry_safe((pos)->member.next, typeof(*(pos)), member))
    
  3. hlist_entry_safe() does:

     ({ typeof(ptr) ____ptr = (ptr); \
        ____ptr ? hlist_entry(____ptr, type, member) : NULL; \
     })
    
  4. And finally hlist_entry() uses container_of() to get the real struct:

     #define hlist_entry(ptr, type, member) container_of(ptr,type,member)
    

I need that the key be the same name present in the struct h_node.

This is not possible natively. The Linux kernel hashtable API only deals with integer keys. If you take a look at the implementation in linux/hashtable.h, you can see the hash functions used are hash_32() and hash_64(), and they both take unsigned integer values (u32 and u64 respectively).

The Linux kernel hashtable API is very limited and it definitely does not implement the same kind of hashtable that you are used to in other programming languages. It cannot use strings as keys, and it has a fixed size.

If you want to use strings, you will have to hash those strings to generate an unsigned integer. To do so you can either use xxhash() or write your own function. The xxhash() function is relatively new and does not yet seem to be used in kernel code, so I think your kernel was most likely configured without it and you don't have it available.

In any case, beware that if the hash function transforms different strings into the same key, or if hash_add() ends up choosing the same index in the hashtable array to insert the elements, then the two elements will be placed in the same hashtable bucket. Therefore, when retrieving any element (using for example hash_for_each_possible()) you need to take this into consideration and correctly check its name.

Working example

Here's a complete working example to demonstrate basic usage of kernel hashtables, tested on kernel v4.9, but should work on the latest v5.7 too. Note that in this example I am allocating the variables on the stack of the module _init function just for simplicity. This means that I cannot do hash_for_each_possible() from anywhere else in the code except from inside that function. If you want a global hashtable capable of holding elements that are later accessed by different functions, you will need to allocate them dynamically using kmalloc().

// SPDX-License-Identifier: GPL-3.0
#include <linux/hashtable.h> // hashtable API
#include <linux/module.h>    // module_{init,exit}, MODULE_*
#include <linux/string.h>    // strcpy, strcmp
#include <linux/types.h>     // u32 etc.

#define MAX 32

struct h_node {
    int data;
    char name[MAX];
    struct hlist_node node;
};

DECLARE_HASHTABLE(tbl, 3);

// Just to demonstrate the behavior when two keys are equal.
static u32 myhash(const char *s) {
    u32 key = 0;
    char c;

    while ((c = *s++))
        key += c;

    return key;
}

static int __init myhashtable_init(void)
{
    struct h_node a, b, *cur;
    u32 key_a, key_b;
    unsigned bkt;

    pr_info("myhashtable: module loaded\n");

    a.data = 3;
    strcpy(a.name, "foo");

    b.data = 7;
    strcpy(b.name, "oof");

    /* Calculate key for each element.
     * Since the above hash function only sums the characters, we will
     * end up having two identical keys here.
     */
    key_a = myhash(a.name);
    key_b = myhash(b.name);

    pr_info("myhashtable: key_a = %u, key_b = %u\n", key_a, key_b);

    // Initialize the hashtable.
    hash_init(tbl);

    // Insert the elements.
    hash_add(tbl, &a.node, key_a);
    hash_add(tbl, &b.node, key_b);

    // List all elements in the table.
    hash_for_each(tbl, bkt, cur, node) {
        pr_info("myhashtable: element: data = %d, name = %s\n",
            cur->data, cur->name);
    }

    // Get the element with name = "foo".
    hash_for_each_possible(tbl, cur, node, key_a) {
        pr_info("myhashtable: match for key %u: data = %d, name = %s\n",
            key_a, cur->data, cur->name);

        // Check the name.
        if (!strcmp(cur->name, "foo")) {
            pr_info("myhashtable: element named \"foo\" found!\n");
            break;
        }
    }

    // Remove elements.
    hash_del(&a.node);
    hash_del(&b.node);

    return 0;
}

static void __exit myhashtable_exit(void)
{
    // Do nothing (needed to be able to unload the module).
    pr_info("myhashtable: module unloaded\n");
}


module_init(myhashtable_init);
module_exit(myhashtable_exit);
MODULE_VERSION("0.1");
MODULE_DESCRIPTION("Silly kernel hashtable API example module.");
MODULE_AUTHOR("Marco Bonelli");
MODULE_LICENSE("GPL");

dmesg output on my machine:

[ 3174.567029] myhashtable: key_a = 324, key_b = 324
[ 3174.567030] myhashtable: element: data = 7, name = oof
[ 3174.567031] myhashtable: element: data = 3, name = foo
[ 3174.567032] myhashtable: match for key 324: data = 7, name = oof
[ 3174.567033] myhashtable: match for key 324: data = 3, name = foo
[ 3174.567033] myhashtable: element named "foo" found!
Botha answered 26/3, 2020 at 18:22 Comment(5)
"If you want a global hashtable capable of holding elements that are later accessed by different functions, you will need to allocate them dynamically using kmalloc()." This includes the nodes which will be inserted or just the hash table itself ?Accomplice
@Accomplice it includes the nodes.Botha
I've tried to call hash_init in the global context but didn't work, when I call inside a function this works, there is a reason for that? There is a way to call it in the global scope?Accomplice
@Accomplice why do you think that you can run code outside of a function? There is no way to do that in C. hash_init() is not just some fancy macro, it does a real function call.Botha
@Matheus, @Marco: alternatively, instead of DECLARE_HASHTABLE() + hash_init(), you can use the DEFINE_HASHTABLE() macro, which defines and initializes the hashtable.Menadione

© 2022 - 2024 — McMap. All rights reserved.