SQL Server 2014 Hashbytes of a nvarchar(max) result is nvarchar(max)
Asked Answered
P

4

12

Using SQL Server 2014 I have a table that has a nvarchar(max) column called [ASCII File] which can contain an ASCII text file of many K. I then want to do a MD5 hashbytes on that file and the resultant hash should always be 20 bytes.

Well when I do a select of hashbytes('MD5', [ASCII File]) I get query completed with errors

Msg 8152, Level 16, State 10, Line 4
String or binary data would be truncated.

I get the same message when I try

left(hashbytes('MD5', [ASCII File]), 50)

I get the same message when I try

convert(varchar(50), hashbytes('MD5', [ASCII File]))

It seems like since the column I am doing the hashbytes on is nvarchar(max), the result of the hashbytes function also is nvarchar(max).

Can you tell me how I can get the result to be the expected 20 long and not something so long it has to be truncated?

Pierce answered 30/7, 2016 at 4:10 Comment(0)
T
15

It seems like since the field I am doing the hashbytes on is nvarchar(max) the result of the hashbytes is nvarchar(max).

No, that is not possible, especially since the return value of HASHBYTES is a VARBINARY. Also, since your tests were just SELECT statements and not an INSERT statement, there is no way for the return value to get a truncation error. The truncation error is coming from the input value. As stated in that linked MSDN page for HASHBYTES (for SQL Server 2012 and 2014):

Allowed input values are limited to 8000 bytes. The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

That really says it all: the input is limited to 8000 bytes, and the output is a fixed number of bytes, based on the specified algorithm.

The updated documentation, for SQL Server 2016 (which has removed the 8000 byte limitation), states:

For SQL Server 2014 and earlier, allowed input values are limited to 8000 bytes.

You can run a simple test:

DECLARE @Test NVARCHAR(MAX) = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000);
SELECT LEN(@Test);
SELECT HASHBYTES('MD5', @Test);

Returns:

50000

Msg 8152, Level 16, State 10, Line 3
String or binary data would be truncated.

If you want to pass in more than 8000 bytes to a hash function in a version of SQL Server prior to 2016, then you need to use SQLCLR. You can either write your own function, or you can download and install the Free version of the SQL# SQLCLR library (which I created), and use the Util_Hash and Util_HashBinary functions:

DECLARE @Test NVARCHAR(MAX) = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000);
SELECT LEN(@Test);
SELECT SQL#.Util_Hash('MD5', CONVERT(VARBINARY(MAX), @Test));
SELECT SQL#.Util_HashBinary('MD5', CONVERT(VARBINARY(MAX), @Test));

Returns:

50000
40752EB301B41EEAEB309348CE9711D6
0x40752EB301B41EEAEB309348CE9711D6

UPDATE

In the case of using a VARCHAR(MAX) column or variable but with 8000 or fewer characters (or an NVARCHAR(MAX) column or variable with 4000 or fewer characters), there will be no issue and everything will work as expected:

DECLARE @Test VARCHAR(MAX) = REPLICATE('t', 5000);
SELECT LEN(@Test) AS [Characters], 
       HASHBYTES('MD5', @Test) AS [MD5];

Returns:

5000    0x6ABFBA10B49157F2EF8C85862B6E6313
Torpor answered 30/7, 2016 at 5:43 Comment(2)
You misunderstood. The 8000 characters is not the issue. MY test data files are only 1k or 2k way under the limit and yet because the Column is defined as varchar(max) it throws that error. The column is defined as varchar(max) because in the future we may or may not have larger files and we will be going to SQL Server 2016.Pierce
@Pierce Yes, I did misunderstand, but that's due to you stating in the question "ASCII text file of many K" and the error that you are describing not being possible given the details you just clarified in the comment above. I updated my answer to add an example at the end showing that this works. As I, and the documentation, state: the output of HASHBYTES is always VARBINARY(8000). It's also not possible to get a truncation error from a return value on a simple SELECT statement. A return value can only get a truncation error when inserting into a field smaller than the supplied value.Torpor
M
5

In SQL Server 2016 we don't have any more the problem of length of input parameter for HASHBYTES function.

DECLARE @Test NVARCHAR(MAX); 
SET @Test = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000000);
SELECT LEN(@Test);
SELECT HASHBYTES('SHA2_512', @Test);

HASHBYTES (Transact-SQL)

Mc answered 15/5, 2017 at 14:44 Comment(0)
H
4

If you are trying to convert a large varbinary or image file already in sql then there are some built in functions that can do this (possibly from 2014 onwards), this simple function will work for both varbinary(max) and older Image fields..

/****** Object:  UserDefinedFunction [dbo].[MD5Bin]    Script Date: 16/07/2018 11:04:26 ******/

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

-- ==================================================
-- Author:    Darren Steven
-- Create date: 16/07/2018
-- Description: Hashes a binary or image field with MD5
-- ==================================================
CREATE FUNCTION [dbo].[MD5Bin](@value varbinary(max))
RETURNS varchar(32)
AS
BEGIN
  RETURN SUBSTRING(master.sys.fn_sqlvarbasetostr(master.sys.fn_repl_hash_binary(@value)),3,32);
END
GO

then simply call the function with your select:

SELECT dbo.MD5Bin(imageFieldName) from dbo.yourTable 
Headphone answered 16/7, 2018 at 10:18 Comment(1)
Works on SQL2008R2Vala
P
2

The input length limit of 8,000 bytes for the HASHBYTES (Transact-SQL) function is removed in sql 2016

Based on algorithm below are the output data size 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256 512 bits (64 bytes) for SHA2_512.

Prosecutor answered 19/4, 2017 at 17:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.