Mysql hamming distance of hexadecimal values
Asked Answered
S

1

16

I have some hashes stored in mysql, which I would fetch with comparison by hamming distance.

Hashes stored are these:

qw 1 ffe71b001820a1fd 
qw 2 ffffb81c1c3838a0 
qw 3 fff8381c1c3e3828 
qw 4 fffa181c3c2e3920 
qw 5 fffa981c1c3e2820 
qw 6 ff5f1c38387c1c04 
qw 7 fff1e0c1c38387ef 
qw 8 fffa181c1c3e3820 
qw 9 fffa381c1c3e3828

I normally fetch like:

SELECT product_id, HAMMING_DISTANCE(phash, 'phashfromuserinput') ;

But in mysql hamming distance is bitwise operator which I can do if strings were only numbers:

SELECT pagedata,BIT_COUNT(pagecontent^'$encrypted')searchengine WHERE pagecontent > 2 ; ")

It only works in integer (number) but my requirement is work with numbers and alphabets, for example:

74898fababfbef46 and 95efabfeba752545

From my little research I know that first I have to convert field to binary and then use bitcount by using CAST or CONVERT like:

SELECT BIT_COUNT( CONV( hash, 2, 10 ) ^ 
0b0000000101100111111100011110000011100000111100011011111110011011 )

or

SELECT BIT_COUNT(CAST(hash AS BINARY)) FROM data;

This is ok as converting data to binary and using bitcount. Now question arises that varbinary characters/hashes stored in mysql already are alphanumeric and if I convert field to varbinary and bitcount then it will not work as stored hashes are not binary strings.

What should I do?

I was refering as php hamming distance matching example of:

function HammingDistance($bin1, $bin2) {
    $a1 = str_split($bin1);
    $a2 = str_split($bin2);
    $dh = 0;
    for ($i = 0; $i < count($a1); $i++) 
        if($a1[$i] != $a2[$i]) $dh++;
    return $dh;
}

echo HammingDistance('10101010','01010101'); //returns 8

But I'm not understanding how to match with mysql and fetch, because I can't implement it in mysql.

Senatorial answered 20/6, 2015 at 11:29 Comment(7)
Hamming distance works on binary values. The first nine values appear to be 16 hex digits, easy to interpret as 64-bit binary values. We know how to work with that. Then you say "only works in integer"... that's sort of true, in that we can represent 64-bit binary value as a BIGINT. Then you say your requirement is "and alphabets" [sic], and you show values that contain 'v' and 'g', and those aren't valid hex digits. What in the plastic? Before your question can be answered, you need to explain what binary value 95gfgdgd75425456 is supposed to represent.Reinhardt
Sorry u hadn't understand question.In short,i have hashes and yes they are hex decimal stored in mysql.Just want to compare that with i' m sending ,which can be done with bit_count,but i have heard it works only in integer.So if i use normally use as SELECT pagedata,BIT_COUNT(pagecontent^'$encrypted')searchengine WHERE pagecontent > 2 ; "),my main doubt is it work in alphanumeric or not? ANd that's doubt made me to do research for alternative of bit_count.I get it work in integer only from here : #4777570Senatorial
I understood the question. What I don't understand is what binary value the strings 74898acvdf566556 and 95gfgdgd7542545 are supposed to represent. (Those are sixteeen characters, and most of the characters are valid hex digits, but the characters v and g are not valid hexadecimal digits.) As to your "main doubt is it work in alphanumeric or not"... No, it doesn't work. Hamming distance works on binary values. It's easy to convert hexadecimal string representation to binary...Reinhardt
What I don't understand is how you plan on handling "alpahanumeric" strings that contain characters that aren't valid hexadecimal digits. Converting 16 hexadecimal characters to 64-bit binary is not hard CONVERT(CONV('ffe71b001820a1fd',16,10),UNSIGNED) Reference: CONV() function and CONVERT() function.Reinhardt
Ok,actually 74898acvdf566556 and 95gfgdgd7542545 are mistyped and they are as 74898fababfbef46 and 95efabfeba752545,means they are hexadecimalSenatorial
@spencer7593,suppose all hashes as you'r saying(oviously you are correct) are hexdecimal and we can convert to 64bit binary,then how to match with already stored hashes in mysql?Means,64bit binary to mysql hashes,doubt is now i should convert stored hashes as 64 bit also or what?Senatorial
yes, convert both of the values you are comparing to 64-bit binary. In MySQL, that means the target datatype for the conversion is BIGINT UNSIGNED. You can use either the CAST or CONVERT function, and specify UNSIGNED as the target datatype. The MySQL expression you use to do that conversion depends on the datatype/representation you are converting from. See the answer from Rick James to an example of converting a literal string or varchar column that contains 16 hexadecimal digits.Reinhardt
P
8

Using the last two numbers as an example:

SELECT BIT_COUNT( CAST(CONV('fffa181c1c3e3820', 16, 10) AS UNSIGNED) ^
                  CAST(CONV('fffa381c1c3e3828', 16, 10) AS UNSIGNED) ) ;
--> 2
  • The hashes are hex.
  • The conversion needs to end up with BIGINT UNSIGNED.

(If you had had MD5 (128-bit) or SHA1 (160-bit) hashes, we would have had to split them via SUBSTR(), Xor each pair, BIT_COUNT, then added the results.)

Edit to use column name:

SELECT BIT_COUNT( CAST(CONV( a.pagecontent , 16, 10) AS UNSIGNED) ^
                  CAST(CONV( b.pagecontent , 16, 10) AS UNSIGNED) ) ;
Pastorship answered 23/6, 2015 at 15:47 Comment(9)
That ok,but main question is matching client side data to servor side hashes ,your solution is for client side(means changing before fecthig)but what about already stored i mysql,how to change that?Senatorial
@125fura: "how to change that" (where "that" refers to values that are stored in a column in a MySQL table) that depends on the datatype of the column, and how the binary values are represented. (Are the columns CHAR(16) containing sixteen hex digits, or are the columns defined VARCHAR(21) and contain 'qw 4 fffa181c3c2e3920' as shown in your question. The exact expressions you'd need to use depends on how the binary hash values are being represented in the column.Reinhardt
Please provide SHOW CREATE TABLE.Pastorship
@spencer7593, data are stored as varchar(16) and qw, 4 are name and no. which is not compulsory.Senatorial
@RickJames data table is : CREATE TABLE searchengine ( id INT NOT NULL AUTO_INCREMENT , pageurl BLOB NOT NULL , pagecontent varchar (16) NOT NULL , PRIMARY KEY ( id ) ) ENGINE = MYISAMSenatorial
That should clarify a previous question. It does not change my proposed solution. What issues remain?Pastorship
This answer demonstrates the MySQL expression you would use to get compute the hamming difference on 64-bit binary values, converted from hexadecimal string representation. I think this answers the question that was asked, i.e. how to compute the hamming distance in a MySQL expression.Reinhardt
For anyone using more than 64 bits, caution. This won't work, at least not "as is". You would need to break the hex into 16-digit groups, BIT_COUNT each ^, then add up the bit_counts.Pastorship
Update -- MySQL 8.0 extends bit operators to BLOB (in addition to BIGINT).Pastorship

© 2022 - 2024 — McMap. All rights reserved.