I have written the following SQL CLR
function in order to hash string values larger then 8000 bytes (the limit of input value of the T-SQL
built-it HASHBYTES
function):
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
public static SqlBinary HashBytes(SqlString algorithm, SqlString value)
{
HashAlgorithm algorithmType = HashAlgorithm.Create(algorithm.Value);
if (algorithmType == null || value.IsNull)
{
return new SqlBinary();
}
else
{
byte[] bytes = Encoding.UTF8.GetBytes(value.Value);
return new SqlBinary(algorithmType.ComputeHash(bytes));
}
}
It is working fine for Latin strings. For example, the following hashes are the same:
SELECT dbo.fn_Utils_GetHashBytes ('MD5', 'test'); -- 0x098F6BCD4621D373CADE4E832627B4F6
SELECT HASHBYTES ('MD5', 'test'); -- 0x098F6BCD4621D373CADE4E832627B4F6
The issue is it is not working with Cyrillic strings. For example:
SELECT dbo.fn_Utils_GetHashBytes ('MD5 ', N'даровете на влъхвите') -- NULL
SELECT HashBytes ('MD5 ',N'даровете на влъхвите') -- 0x838B1B625A6074B2BE55CDB7FCEA2832
SELECT dbo.fn_Utils_GetHashBytes ('SHA256', N'даровете на влъхвите') -- 0xA1D65374A0B954F8291E00BC3DD9DF655D8A4A6BF127CFB15BBE794D2A098844
SELECT HashBytes ('SHA2_256',N'даровете на влъхвите') -- 0x375F6993E0ECE1864336E565C8E14848F2A4BAFCF60BC0C8F5636101DD15B25A
I am getting NULL
for MD5
, although the code returns value if it is executed as console application. Could anyone tell what I am doing wrong?
Also, I've got the function from here and one of the comments says that:
Careful with CLR SP parameters being silently truncated to 8000 bytes - I had to tag the parameter with [SqlFacet(MaxSize = -1)] otherwise bytes after the 8000th would simply be ignored!
but I have tested this and it is working fine. For example, if I generate a hash of 8000 bytes string and a second hash of the same string plus one symbol, I get the hashes are different.
DECLARE @A VARCHAR(MAX) = '8000 bytes string...'
DECLARE @B VARCHAR(MAX) = @A + '1'
SELECT LEN(@A), LEN(@B)
SELECT IIF(dbo.fn_Utils_GetHashBytes ('MD5', @A + '1') = dbo.fn_Utils_GetHashBytes ('MD5', @B), 1, 0) -- 0
Should I worry about this?
HashBytes
in your case operates on unicode string, which is utf-16. Your function, on the other hand, takes utf-16 input and converts it to utf-8 before calculating hash. Inputs are diffreent, so are the hashes. – HipboneN
prefix), and for ASCII string utf-8 representation is the same, so hashes are equal. – Hipbone