Ruby's pack and unpack explained
Asked Answered
W

4

13

Even after reading the standard documentation, I still can't understand how Ruby's Array#pack and String#unpack exactly work. Here is the example that's causing me the most trouble:

irb(main):001:0> chars = ["61","62","63"]
=> ["61", "62", "63"]
irb(main):002:0> chars.pack("H*")
=> "a"
irb(main):003:0> chars.pack("HHH")
=> "```"

I expected both these operations to return the same output: "abc". Each of them "fails" in a different manner (not really a fail since I probably expect the wrong thing). So two questions:

  1. What is the logic behind those outputs?
  2. How can I achieve the effect I want, i.e. transforming a sequence of hexadecimal numbers to the corresponding string. Even better - given an integer n, how to transform it to a string identical to the text file that when is considered as a number (say, in a hex editor) equals n?
Wildfire answered 5/12, 2012 at 17:57 Comment(1)
For 'H' formats, * isn't acting in an expected manner according to the documentation. Other format characters seem to behave correctly, so I suspect it's a bug in Ruby's use of 'H*'.Importunacy
I
13

We were working on a similar problem this morning. If the array size is unknown, you can use:

ary = ["61", "62", "63"]
ary.pack('H2' * ary.size)
=> "abc"

You can reverse it using:

str = "abc"
str.unpack('H2' * str.size)
=> ["61", "62", "63"]
Importunacy answered 5/12, 2012 at 18:41 Comment(7)
Would this be efficient for large input?Spit
Should be very efficient. The only added cost I see is creating the temporary format string, which 'H*' would have to do anyway.Importunacy
In general H seems to be "special". We worked on several different ways to get it to work in a manner consistent with the other format characters, and it wouldn't, so we hit upon using the size to extend the H2 string as necessary.Importunacy
H isn't too special, all the "string" consuming format codes (AaHhZBbuMmPp) interpret the length suffix as the number of input elements they consume from the current string element, not the number elements to be consumed from the array.Sihonn
Except that the documentation says If the count is an asterisk (“*”), all remaining array elements will be converted. They're not though, only the first one is.Importunacy
@theTinMan Ah, I see now. That's a documentation bug. Ruby's pack is modeled after Perl's and perldoc is unambiguous that the count value has different meaning for AaZHhBbPp.Sihonn
@theTinMan why used H2 just to have it in my learning i asked. Could you say why not only H?Widespread
T
12

The 'H' String directive for Array#pack says that array contents should be interpreted as nibbles of hex strings.

In the first example you've provided:

irb(main):002:0> chars.pack("H*")
=> "a"

you're telling to pack the first element of the array as if it were a sequence of nibbles (half bytes) of a hex string: 0x61 in this case that corresponds to the 'a' ASCII character.

In the second example:

irb(main):003:0> chars.pack("HHH")
=> "```"

you're telling to pack 3 elements of the array as if they were nibbles (the high part in this case): 0x60 corresponds to the '`' ASCII character. The low part or second nibble (0x01) "gets lost" due to missing '2' or '*' modifiers for "aTemplateString".

What you need is:

chars.pack('H*' * chars.size)

in order to pack all the nibbles of all the elements of the array as if they were hex strings.

The case of 'H2' * char.size only works fine if the array elements are representing 1 byte only hex strings.

It means that something like chars = ["6161", "6262", "6363"] is going to be incomplete:

2.1.5 :047 > chars = ["6161", "6262", "6363"]
 => ["6161", "6262", "6363"] 
2.1.5 :048 > chars.pack('H2' * chars.size)
 => "abc" 

while:

2.1.5 :049 > chars.pack('H*' * chars.size)
 => "aabbcc"
Textuary answered 29/1, 2015 at 14:16 Comment(0)
S
5

The Array#pack method is pretty arcane. Addressing question (2), I was able to get your example to work by doing this:

> ["61", "62", "63"].pack("H2H2H2")
=> "abc" 

See the Ruby documentation for a similar example. Here is a more general way to do it:

["61", "62", "63"].map {|s| [s].pack("H2") }.join

This is probably not the most efficient way to tackle your problem; I suspect there is a better way, but it would help to know what kind of input you are starting out with.

The #pack method is common to other languages, such as Perl. If Ruby's documentation does not help, you might consult analogous documentation elsewhere.

Spit answered 5/12, 2012 at 18:11 Comment(0)
E
4

I expected both these operations to return the same output: "abc".

The easiest way to understand why your approach didn't work, is to simply start with what you are expecting:

"abc".unpack("H*")
# => ["616263"]

["616263"].pack("H*")
# => "abc"

So, it seems that Ruby expects your hex bytes in one long string instead of separate elements of an array. So the simplest answer to your original question would be this:

chars = ["61", "62", "63"]
[chars.join].pack("H*")
# => "abc"

This approach also seems to perform comparably well for large input:

require 'benchmark'

chars = ["61", "62", "63"] * 100000

Benchmark.bmbm do |bm|
  bm.report("join pack") do [chars.join].pack("H*") end
  bm.report("big pack") do chars.pack("H2" * chars.size) end
  bm.report("map pack") do chars.map{ |s| [s].pack("H2") }.join end
end

#                 user     system      total        real
# join pack   0.030000   0.000000   0.030000 (  0.025558)
# big pack    0.030000   0.000000   0.030000 (  0.027773)
# map pack    0.230000   0.010000   0.240000 (  0.241117)
Eurydice answered 1/10, 2014 at 9:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.