DCPcrypt Hashing German Umlauts
Asked Answered
M

1

12

I am using DCPcrypt and SHA512 to hash strings.

I am using the version by Warren Postma https://bitbucket.org/wpostma/dcpcrypt2010

It is working fine. However it failes with german umlauts like ä, ö, ü and probably other unicodes.

I am using the library like this:

function TForm1.genhash(str: string): string;
var
  Hash  : TDCP_sha512;
  Digest: array[0..63] of byte;
  i: integer;
  s: string;
begin
  s:= '';
  hash  := TDCP_sha512.Create(nil);
  if hash<>nil then
  begin
    try
      Hash.Init;
      Hash.UpdateStr(str);
      Hash.Final(Digest);

      for i:= 0 to length(Digest)-1 do
        s:= s + IntToHex(Digest[i],2);

    finally
      hash.free;
    end;

  end;
  Result := s;
end;

When i input the letter ä i expect the output to be:

64868C5784A6004E675BCF405F549369BF607CD3269C0CAC1711E21BA9F40A5ABBF0C7535856E7CF77EA55A072DD04AA89EEA361E95F497AA965309B50587157

I checked it with those sites: http://hashgenerator.de/ http://passwordsgenerator.net/sha512-hash-generator/

However i get:

1A7F725BD18E062020A646D4639F264891368863160A74DF2BFC069C4DADE04E6FA854A2474166EED0914B922A9D8BE0C89858D437DDD7FBCA5C9C89FC07323A

So my question is: How can i use the DCPcrypt library to generate hashes for german umlauts? THanks

Mice answered 3/3, 2016 at 9:35 Comment(0)
W
20

This must be the single most common mistake that people make with hashing and encryption. These algos operate on binary data, but you are passing text. Something somewhere has got to encode that text as binary. And what encoding should be used. How do you know that your library uses the same as the online tool? You don't.

So, here's a rule for you to follow. Never hash text. Just don't do it. Encode the text as binary using a well-defined, explicitly chosen encoding. And hash that. I suggest you encode as UTF-8 and hash that. So, TEncoding.UTF8.GetBytes(...) is your friend here.

Now, looking at the actual detail here, you are calling this method:

procedure UpdateStr(const Str: RawByteString);

The RawByteString parameter, means that your Unicode text is being converted to an ANSI string, with the default system code page. I'm sure that's not what you intend to happen. Indeed the compiler says this:

[dcc32 Warning] W1058 Implicit string cast with potential data loss from 'string' to 'RawByteString'

So the compiler is telling you that you are doing something wrong. You really must take good heed of compiler messages.

Now, you could call UpdateUnicodeStr instead of UpdateStr. But again, how do you know what encoding is used? It happens to be the native internal encoding, UTF-16LE.

But, let's follow my rule of never encoding text.

{$APPTYPE CONSOLE}

uses
  SysUtils, Classes, DCPsha512;

function genhash(str: string): string;
var
  Bytes: TBytes;
  Hash: TDCP_sha512;
  Digest: array[0..63] of byte;
begin
  Bytes := TEncoding.UTF8.GetBytes(str); // encode text as UTF-8 bytes

  hash := TDCP_sha512.Create(nil);
  try
    Hash.Init;
    Hash.Update(Pointer(Bytes)^, Length(Bytes));
    Hash.Final(Digest);
  finally
    hash.Free;
  end;

  // convert the digest to a hex hash string
  SetLength(Result, Length(Digest)*2);
  BinToHex(Digest, PChar(Result), Length(Digest));
end;

begin
  Writeln(genhash('ä'));
  Readln;
end.

Output

64868C5784A6004E675BCF405F549369BF607CD3269C0CAC1711E21BA9F40A5ABBF0C7535856E7CF77EA55A072DD04AA89EEA361E95F497AA965309B50587157

Note that I simplified the code in some other ways. I removed the local string variable and worked directly with Result. I used BinToHex from the Classes unit to do the digest to hex conversion. I also changed this code:

hash := TDCP_sha512.Create(nil);
if hash<>nil then
  ....

to remove the if statement which is not needed. If a constructor fails, an exception is raised.

Please follow my rule never to hash text. It will serve you well!

Watercress answered 3/3, 2016 at 10:14 Comment(2)
Thanks David for this great answer. Appreciate the explanation and definitely learned something!Mice
Great. And thank you for the question. It's nice to get a good clear statement of the problem, and have the opportunity to finally write down what has been nagging away at me for some time. I hope that we can use this Q&A to spread the word about binary and text with hashing and encryption!Watercress

© 2022 - 2024 — McMap. All rights reserved.