How to calculate the real SHA1 of a text?
Asked Answered
E

1

6

As in my last question (see details there), I am using

SELECT encode(digest(x::text::bytea, 'sha1'), 'hex') FROM xtmp;

Not solved, is not the same hash tham original... Perhaps the ::text enforced internal representation with \n symbols, so a solution will be direct cast to bytea, but it is an invalid cast.

The other workaround also not is a solution,

SELECT encode(digest( replace(x::text,'\n',E'\n')::bytea, 'sha1' ), 'hex') 
FROM xtmp

... I try CREATE TABLE btmp (x bytea) and COPY btmp FROM '/tmp/test.xml' ( FORMAT binary ), but error ("unknown COPY file signature").

Extravagancy answered 26/1, 2018 at 20:42 Comment(0)
E
5

Simple solution! Add a "\n".

SELECT encode(digest((x::text||E'\n')::bytea, 'sha1'), 'hex') FROM xtmp;

But the real problem is to get original file without cut the last "\n" (last EOL)... Lets see the function at my old test-kit:

 INSERT INTO  xtmp (x) 
  SELECT array_to_string(array_agg(x),E'\n')::xml FROM ttmp
;

It is here the "bug" (after workaround the COPY that not load a full-file into one row one field).
The array_to_string() not adds the last EOL, so concatenating by || E'\n' fixed the bug.


NOTES

Checking other hypothesis and offering a good solution to the test-kit.

POSIX rule is not the problem ...

The end-of-line (EOL) is an obligation at POSIX filesystems (and non-binary mode), see this answer about EOL. We can imagine something like "string and file representations differ by an EOL"... We can check? it differs?

We can demonstrate by terminal that there are no "string vs file" problem, no strange addiction of EOL:

printf "<root/>" > original1.xml 
printf "<root/>\n" > original2.xml 
sha1sum original*.xml
printf "<root/>" | openssl sha1
printf "<root/>\n" | openssl sha1

results

062c3db8ce3458fc3ccaf2f930bf663d8ce31d7d  original1.xml
a05d91cbf0902b0fe341c979e9fc18fc69813f55  original2.xml
(stdin)= 062c3db8ce3458fc3ccaf2f930bf663d8ce31d7d
(stdin)= a05d91cbf0902b0fe341c979e9fc18fc69813f55

So sha1sum is not using an extra EOL, the string and the file are the some.

Now at SQL, the same conclusions:

SELECT encode(digest('<root/>'::bytea, 'sha1'), 'hex') ;
SELECT encode(digest(E'<root/>\n'::bytea, 'sha1'), 'hex') ;

results

 062c3db8ce3458fc3ccaf2f930bf663d8ce31d7d
 a05d91cbf0902b0fe341c979e9fc18fc69813f55

Solution for a better test-kit

The COPY command is ugly for this simple load/save text procedure, lets use a direct getfile function instead:

CREATE FUNCTION getfile(p_file text) RETURNS text AS $$
   with open(args[0],"r") as content_file:
       content = content_file.read()
   return content
$$ LANGUAGE PLpythonU;

SELECT encode(digest( getfile('/tmp/original1.xml') ::bytea, 'sha1'), 'hex') ;
SELECT encode(digest( getfile('/tmp/original2.xml') ::bytea, 'sha1'), 'hex') ;

results

062c3db8ce3458fc3ccaf2f930bf663d8ce31d7d
a05d91cbf0902b0fe341c979e9fc18fc69813f55

Perfect (!), no EOL problem now.

Extravagancy answered 26/1, 2018 at 20:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.