Lightweight 8 byte hash function algorithm
Asked Answered
M

4

10

I need to extract an 8 byte digest from a variable length string so I'm looking for such an algorithm that I will implement in c/c++. That will be part of a digital signature procedure on a microcontroller, so it has to be:

  • writable in few lines of code, since the firmware has to be kept as little as possible;
  • low in resource consumption, expecially ram (preferably less than 100 bytes);
  • strong enough that changing a single character at any point of the string would change the overall digest.

I took a look at existing algorithms such as crc64 but they seems to be too heavy for my platform.

Mimosa answered 10/11, 2012 at 19:0 Comment(3)
There are many hash functions available (and readily found). Which existing function(s) looked "close to" the desired goal, and why? If they weren't acceptable, why? There are a number of good results/reading for a simple "hash function C" - honestly, only the 3rd requirement posted looks of any interest. Also, since CRC was mentioned, is the goal a [general] hash or a checksum?Garver
Maybe this can be helpful: en.wikipedia.org/wiki/List_of_hash_functions Maybe check out sphlib as well but to clarify something 8 bytes will result in collisions so point 3 of your requirements can't be fulfilled by ANY hashing algorithm at least not for all strings and 8 bytes is pretty low.Callender
@pst: I took in account some of the existing hash functions that give a 64 bit output, but for example the crc64 needs far more than 100 bytes of ram. As I stated in the question, the goal is to obtain a message digest, so a cryptographic function would be better. Still, I need it to be lightweight more than strong, so I taken in account other kind of hash functions as well.Mimosa
O
5

As AndrewTomazos-Fathomling said, it's impossible to do a secure hash in 64 bits, so if that's your intention then my advice is STOP, pick up a book and read about cryptographically secure hashing.

If you don't plan on using this as a secure hash and you do not care about collisions or attacks, then the answer he gave you works just fine and you can tweak the primes P1 and P2 as necessary. I will give you another alternative which allows you to do tagged hashing and mixes things up more.

// Disclaimer: I make no claims about the quality of this particular hash - it's 
// certainly not a cryptographically secure hash, nor should it *ever* be 
// construed as such. 

unsigned long long quickhash64(const char *str, unsigned long long mix = 0)
{ // set 'mix' to some value other than zero if you want a tagged hash          
    const unsigned long long mulp = 2654435789;

    mix ^= 104395301;

    while(*str)
        mix += (*str++ * mulp) ^ (mix >> 23);

    return mix ^ (mix << 37);
}
Overlay answered 10/11, 2012 at 21:40 Comment(1)
I liked this one better because it keeps the sensitiveness against all chars of the string, other ones that used shifts loses the influence of the first chars of the string after certain lenghts. By the way I see that it could be shortened like, for example, uint64_t mix, mulp = 2654435789; while(*str) mix^=mulp**str++;.Mimosa
M
10

There is no chance to do a secure hash in 64 bits. Even SHA-1 at 160 bit is considered theoretically broken. You should use SHA2-256 if you really care about secure digital signing. If you don't care about security and just want a hash function that avoids non-adversarial collisions just use the following, it is fine:

constexpr uint64 P1 = 7;
constexpr uint64 P2 = 31;

uint64 hash = P1;
for (const char* p = s; *p != 0; p++) {
    hash = hash * P2 + *p;
}
Mettlesome answered 10/11, 2012 at 19:12 Comment(4)
+1, though strlen is not a great variable-name in a C program. :-PTaking
Thank you for your answer, though this doesn't really fullfill my third point: mystring1 => 10000786a32ed, mystring2 => 10000786a32ee. I'd need something that can "propagate" a bit more the single character change through the hash.Mimosa
You want what is called the "avalanche effect", but ask yourself why you want this. It really only makes sense in the context of secure hashing, and with only 64bits it will never be secure against a brute force attack. You can get more flipped bits by using two larger primes for P1 and P2, but as I said there is no point.Mettlesome
Why, exactly, do you need changes to avalanche faster? As @AndrewTomazos-Fathomling said, it is starting to sound as if you want to do "secure hashing" but with only 64-bits it's just not going to work.Overlay
O
5

As AndrewTomazos-Fathomling said, it's impossible to do a secure hash in 64 bits, so if that's your intention then my advice is STOP, pick up a book and read about cryptographically secure hashing.

If you don't plan on using this as a secure hash and you do not care about collisions or attacks, then the answer he gave you works just fine and you can tweak the primes P1 and P2 as necessary. I will give you another alternative which allows you to do tagged hashing and mixes things up more.

// Disclaimer: I make no claims about the quality of this particular hash - it's 
// certainly not a cryptographically secure hash, nor should it *ever* be 
// construed as such. 

unsigned long long quickhash64(const char *str, unsigned long long mix = 0)
{ // set 'mix' to some value other than zero if you want a tagged hash          
    const unsigned long long mulp = 2654435789;

    mix ^= 104395301;

    while(*str)
        mix += (*str++ * mulp) ^ (mix >> 23);

    return mix ^ (mix << 37);
}
Overlay answered 10/11, 2012 at 21:40 Comment(1)
I liked this one better because it keeps the sensitiveness against all chars of the string, other ones that used shifts loses the influence of the first chars of the string after certain lenghts. By the way I see that it could be shortened like, for example, uint64_t mix, mulp = 2654435789; while(*str) mix^=mulp**str++;.Mimosa
C
3

Here is a modified version of a 32 bit version I found in my old source files

static unsigned long long llhash(const char *str)
{
    unsigned long long hash = 5381;
    int c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c;

    return hash;
}

But hashing will always result in collisions. Of course some algorithms are better than others.

Edit: I found the source of the 32 bit version: http://www.cse.yorku.ca/~oz/hash.html

Callender answered 10/11, 2012 at 19:17 Comment(0)
T
2

I had the exact same requirement, and I settled for FNV-1A, after dismissing SIP hash (implemented by bloomberg here).

I found an FNV implementation here:

https://github.com/foonathan/string_id/blob/master/hash.hpp

which is:

constexpr uint64_t fnv_basis = 14695981039346656037ull;
constexpr uint64_t fnv_prime = 1099511628211ull;

// FNV-1a 64 bit hash of null terminated buffer
uint64_t fnv1a_hash(const char* str, uint64_t hash = fnv_basis)
{
    return *str ? fnv1a_hash(str + 1, (hash ^ *str) * fnv_prime) : hash;
}

It appears he is looping using tail recursion. And stop condition is the null byte.
(boost uses hash_range which is hash_combining each element in chain I guess.)

License is zlib and copyright is Jonathan Müller. Though I'm not convinced a oneliner can be legally licensed if it implements research by other persons (Fowler-Noll-Vo).

Therewith answered 19/4, 2018 at 9:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.