Why our struct has to have the struct h_list
inside it? If we're gonna access our struct through the struct h_list
our struct shouldn't be within struct h_list
and not the opposite?
That's because of how hash tables are implemented in the Linux kernel. An hashtable is just an array of fixed size of struct hlist_head
. Each of those represents a bucket, and is the head of a linked list. The hashtable only contains a bunch of linked lists of struct hlist_node
, nothing else. It does not really "store" the entire user defined struct, it merely holds a pointer to the struct hlist_node
field of each element.
When you add an element to the hashtable, a bucket is selected, and a pointer to the struct hlist_node
field of your struct is inserted in the bucket list. When you later retrieve an element (for example through hash_for_each()
), the container_of()
macro is used to get your real structure back, knowing its type and the name of the struct member of type struct hlist_node
inside your user defined structure.
This can be seen following the source code. For example, for hash_for_each()
we have:
hash_for_each(name, bkt, obj, member)
does:
for ((bkt) = 0, obj = NULL; obj == NULL && (bkt) < HASH_SIZE(name);\
(bkt)++)\
hlist_for_each_entry(obj, &name[bkt], member)
hlist_for_each_entry()
does:
for (pos = hlist_entry_safe((head)->first, typeof(*(pos)), member);\
pos; \
pos = hlist_entry_safe((pos)->member.next, typeof(*(pos)), member))
hlist_entry_safe()
does:
({ typeof(ptr) ____ptr = (ptr); \
____ptr ? hlist_entry(____ptr, type, member) : NULL; \
})
And finally hlist_entry()
uses container_of()
to get the real struct:
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)
I need that the key be the same name present in the struct h_node
.
This is not possible natively. The Linux kernel hashtable API only deals with integer keys. If you take a look at the implementation in linux/hashtable.h
, you can see the hash functions used are hash_32()
and hash_64()
, and they both take unsigned integer values (u32
and u64
respectively).
The Linux kernel hashtable API is very limited and it definitely does not implement the same kind of hashtable that you are used to in other programming languages. It cannot use strings as keys, and it has a fixed size.
If you want to use strings, you will have to hash those strings to generate an unsigned integer. To do so you can either use xxhash()
or write your own function. The xxhash()
function is relatively new and does not yet seem to be used in kernel code, so I think your kernel was most likely configured without it and you don't have it available.
In any case, beware that if the hash function transforms different strings into the same key, or if hash_add()
ends up choosing the same index in the hashtable array to insert the elements, then the two elements will be placed in the same hashtable bucket. Therefore, when retrieving any element (using for example hash_for_each_possible()
) you need to take this into consideration and correctly check its name
.
Working example
Here's a complete working example to demonstrate basic usage of kernel hashtables, tested on kernel v4.9, but should work on the latest v5.7 too. Note that in this example I am allocating the variables on the stack of the module _init
function just for simplicity. This means that I cannot do hash_for_each_possible()
from anywhere else in the code except from inside that function. If you want a global hashtable capable of holding elements that are later accessed by different functions, you will need to allocate them dynamically using kmalloc()
.
// SPDX-License-Identifier: GPL-3.0
#include <linux/hashtable.h> // hashtable API
#include <linux/module.h> // module_{init,exit}, MODULE_*
#include <linux/string.h> // strcpy, strcmp
#include <linux/types.h> // u32 etc.
#define MAX 32
struct h_node {
int data;
char name[MAX];
struct hlist_node node;
};
DECLARE_HASHTABLE(tbl, 3);
// Just to demonstrate the behavior when two keys are equal.
static u32 myhash(const char *s) {
u32 key = 0;
char c;
while ((c = *s++))
key += c;
return key;
}
static int __init myhashtable_init(void)
{
struct h_node a, b, *cur;
u32 key_a, key_b;
unsigned bkt;
pr_info("myhashtable: module loaded\n");
a.data = 3;
strcpy(a.name, "foo");
b.data = 7;
strcpy(b.name, "oof");
/* Calculate key for each element.
* Since the above hash function only sums the characters, we will
* end up having two identical keys here.
*/
key_a = myhash(a.name);
key_b = myhash(b.name);
pr_info("myhashtable: key_a = %u, key_b = %u\n", key_a, key_b);
// Initialize the hashtable.
hash_init(tbl);
// Insert the elements.
hash_add(tbl, &a.node, key_a);
hash_add(tbl, &b.node, key_b);
// List all elements in the table.
hash_for_each(tbl, bkt, cur, node) {
pr_info("myhashtable: element: data = %d, name = %s\n",
cur->data, cur->name);
}
// Get the element with name = "foo".
hash_for_each_possible(tbl, cur, node, key_a) {
pr_info("myhashtable: match for key %u: data = %d, name = %s\n",
key_a, cur->data, cur->name);
// Check the name.
if (!strcmp(cur->name, "foo")) {
pr_info("myhashtable: element named \"foo\" found!\n");
break;
}
}
// Remove elements.
hash_del(&a.node);
hash_del(&b.node);
return 0;
}
static void __exit myhashtable_exit(void)
{
// Do nothing (needed to be able to unload the module).
pr_info("myhashtable: module unloaded\n");
}
module_init(myhashtable_init);
module_exit(myhashtable_exit);
MODULE_VERSION("0.1");
MODULE_DESCRIPTION("Silly kernel hashtable API example module.");
MODULE_AUTHOR("Marco Bonelli");
MODULE_LICENSE("GPL");
dmesg
output on my machine:
[ 3174.567029] myhashtable: key_a = 324, key_b = 324
[ 3174.567030] myhashtable: element: data = 7, name = oof
[ 3174.567031] myhashtable: element: data = 3, name = foo
[ 3174.567032] myhashtable: match for key 324: data = 7, name = oof
[ 3174.567033] myhashtable: match for key 324: data = 3, name = foo
[ 3174.567033] myhashtable: element named "foo" found!
unsigned long
would not be a good idea, because identical string literals do not necessarily have the same address (especially if they in different .c files) so the search key won't match the insertion key. You could usexxhash
to hash the string contents to anunsigned long
value and use that value as the key. Identical byte sequences will xxhash to the same value (as long as they are hashed with the same seed parameter), so it is stable. It needsCONFIG_XXHASH
. – OpenmouthedCONFIG_XXHASH
being configured, you could use your own code to hash the string contents using an FNV (Fowler/Noll/Vo) algorithm.) – Openmouthed