Digest::CRC32 with Zlib

Asked 21/12, 2011 at 18:40 Answered 3/10, 2014 at 22:22

In my code, I need to hash files using a variety of algorithms, including CRC32. Since I'm also using other cryptographic hash functions in the Digest family, I thought it would be nice to maintain a consistent interface for them all.

For the record, I did find digest-crc, a gem which does exactly what I want. The thing is, Zlib is part of the standard library and has a working implementation of CRC32 that I'd like to reuse. Also, it is written in C so it should offer superior performance in relation to digest-crc, which is a pure-ruby implementation.

Implementing Digest::CRC32 actually looked pretty straightforward at first:

%w(digest zlib).each { |f| require f }

class Digest::CRC32 < Digest::Class
  include Digest::Instance

  def update(str)
    @crc32 = Zlib.crc32(str, @crc32)
  end

  def initialize; reset; end
  def reset; @crc32 = 0; end
  def finish; @crc32.to_s; end
end

Everything looks right:

crc32 = File.open('Rakefile') { |f| Zlib.crc32 f.read }
digest = Digest::CRC32.file('Rakefile').digest!.to_i
crc32 == digest
=> true

Unfortunately, not everything works:

Digest::CRC32.file('Rakefile').hexdigest!
=> "313635393830353832"

# What I actually expected was:
Digest::CRC32.file('Rakefile').digest!.to_i.to_s(16)
=> "9e4a9a6"

hexdigest basically returns Digest.hexencode(digest), which works with the value of the digest at the byte level. I'm not sure how that function works, so I was wondering if it is possible to achieve this with just the integer returned from Zlib.crc32.

Hundred answered 21/12, 2011 at 18:40 Comment(1)

What ruby platform are you working on? – Forequarter 21/12, 2011 at 21:33

Digest is expecting digest to return the raw bytes that make up the checksum, i.e. in the case of a crc32 the 4 bytes that makeup that 32bit integer. However you are instead returning a string that contains the base 10 representation of that integer.

You want something like

[@crc32].pack('V')

to turn that integer into the bytes that represent that. Do go and read up on pack and its various format specifiers - there are lots of ways of packing an integer depending on whether the bytes should be presented in native endian-ness, big-endian, little-endian etc so you should figure out which one matches your needs

Aracelis answered 22/12, 2011 at 2:49 Comment(1)

I used [@crc32].pack('N') to get my version of Digest::CRC32.file(filename) to work as expected. – Copenhaver 17/9, 2012 at 0:35

Sorry this doesn't really answer your question but it might help..

Firstly, when reading in a file, make sure you pass the "rb" parameter. I can see you're not on windows but if by chance your code does end up getting ran on a windows machine your code won't work the same, especially when reading ruby files in. Example:

crc32 = File.open('test.rb') { |f| Zlib.crc32 f.read }
#=> 189072290
digest = Digest::CRC32.file('test.rb').digest!.to_i
#=> 314435800
crc32 == digest
#=> false

crc32 = File.open('test.rb', "rb") { |f| Zlib.crc32 f.read }
#=> 314435800
digest = Digest::CRC32.file('test.rb').digest!.to_i
#=> 314435800
crc32 == digest
#=> true

The above will work across all platforms and all rubies.. that I know of.. But that's not what you asked..

I'm pretty sure the hexdigest and digest methods in your above example are working as they should though..

dig_file = Digest::CRC32.file('test.rb')

test1 = dig_file.hexdigest
#=> "333134343335383030"

test2 = dig_file.digest
#=> "314435800"

def hexdigest_to_digest(h)
  h.unpack('a2'*(h.size/2)).collect {|i| i.hex.chr }.join
end

test3 = hexdigest_to_digest(test1)
#=> "314435800"

So I'm guessing either the .to_i.to_s(16) is throwing off your expected result or your expected result may possibly be wrong? Not sure, but all the best

Forequarter answered 21/12, 2011 at 23:1 Comment(2)

You're onto something there; I think the answer is the opposite of that: digest to hexdigest. I tried something with unpack before to try to force base 16 but really I had no idea what I was doing. I still don't understand it. – Hundred 21/12, 2011 at 23:33

digest outputs the "correct" checksum because it just returns what the finish method returns. In reality, it should return a binary string suitable for Digest.hexencode, which should encode the bytes in hexadecimal. So yeah, it seems both my methods are broken. :) – Hundred 21/12, 2011 at 23:40

It works just fine, make sure to always use the network byte order, like this:

def finish; [@crc32].pack('N'); end

Desilva answered 3/10, 2014 at 22:22 Comment(0)

Recommended topics

Hot tags