How do I compare packed values in Perl?
Asked Answered
S

3

6

I want to use the pack() function in Perl to encode some data. Then I want to compare my packed structure to another packed structure. I want this compare to be on the byte values of this packed structure.

According to the documentation, cmp uses the current locale to determine how to compare strings. But I don't want any intelligence applied to the comparison. I want whatever is closest to a memcmp(). Obviously I cannot use <=> for comparing my packed objects as they are not numbers.

What is the best way to compare packed strings in Perl?

Sidenote: I have been reading this article on efficient sorting in Perl which notes that the plain sort function uses a memcmp-like algorithm for comparing structures. I'm wondering how to achieve such a comparison without having to use sort.

Subtilize answered 20/7, 2010 at 8:48 Comment(5)
sort is really an excellent place to start from. Trying to build your own sort-replacement will probably not work as well as you'd like, as the Perl sort has been finely tuned over years. The efficient sorting link you gave actually includes instructions on how to use packed data structures to speed the sort, which is pretty clever, but the sorting would have to be taking a long time before I'd devote myself to maintaining that.Belding
Do you want a comparison (i.e., less than, greater than, or equal to) or a yes-or-no equality test?Giesser
@gbacon: I want something I can order with, so less than, greater than, equal to.Subtilize
perldoc.perl.org/perllocale.html#The-use-locale-pragma - This says the default is to ignore locale. Where do you read that perl is using the current locale? The docs for cmp also say locale is used 'only if use locale is in effect'. If 'use locale' is in effect, use 'no locale' as suggested below.Unamuno
@Unamuno thanks for fixing the typo.Subtilize
G
5

Disable locale considerations for the block and use cmp as usual:

sub mycmp {
  no locale;
  $_[0] cmp $_[1];
}

The perlop documentation provides

lt, le, ge, gt and cmp use the collation (sort) order specified by the current locale if use locale is in effect. See perllocale.

and then in perllocale

The default behavior is restored with the no locale pragma, or upon reaching the end of block enclosing use locale.

For example, running

my($one,$two) = map pack("N", $_) => 1, 2;
say mycmp($one, $two);
say mycmp($two, $one);

outputs

-1
1
Giesser answered 20/7, 2010 at 15:28 Comment(2)
Does "no locale" only apply within the closure? If there is a locale that applies outside the closure will it still apply to any code below the closure?Subtilize
@PP Yes, the locale pragma is lexical: it's in effect only inside its enclosing block.Giesser
L
4

Expand, then contract. Compare for example the hex representation of your structures, which only uses ASCII characters and cannot run afoul of the locale problem you mention.

unpack('H*', $first) cmp unpack('H*', $second)
Loganiaceous answered 20/7, 2010 at 10:39 Comment(0)
L
0

Thinking aloud here - will bitwise operators help? Like doing a xor on two identical strings will give a bitstring with everything set to 0.

http://perldoc.perl.org/perlop.html#Bitwise-String-Operators

Liverwort answered 20/7, 2010 at 9:36 Comment(1)
xor would be a great equality test independent of locale, actually - nice idea - would not be useful for less than/greater than however.Subtilize

© 2022 - 2024 — McMap. All rights reserved.