How can I check if a Perl array contains a particular value?
Asked Answered
G

13

284

I am trying to figure out a way of checking for the existence of a value in an array without iterating through the array.

I am reading a file for a parameter. I have a long list of parameters I do not want to deal with. I placed these unwanted parameters in an array @badparams.

I want to read a new parameter and if it does not exist in @badparams, process it. If it does exist in @badparams, go to the next read.

Gyrostat answered 18/5, 2010 at 19:3 Comment(6)
For the record, the answer does depend on your situation. It sounds like you want to make repeated lookups, so using a hash as jkramer suggests is good. If you only wanted to make one lookup, you might as well just iterate. (And in some cases you might want to binary search instead of using a hash!)Boatswain
perldoc -f grepFeathered
For the record (and this may be totally inapplicable to your situation) it is generally a better idea to identify 'good values' and ignore the rest rather than trying to weed out known 'bad values'. The question you need to ask is whether it's possible there may be some bad values you don't know about yet.Question
Use black magic to find out whether an item is in an array without inspecting the array! As the question is, there is no answer possible! "I am trying to figure out a way of checking for the existence of a value in an array without iterating through the array."Aubreir
Probably you want %badparams, not @badparams.Aubreir
"As the question is, there is no answer possible!" -- not true ... if they create a hash whose keys are the values in the array then they don't have to iterate through the array. "Probably you want %badparams, not @badparams" -- since the OP has @badparams, they can create %badparams from it.Philanthropy
R
209

Simply turn the array into a hash:

my %params = map { $_ => 1 } @badparams;

if(exists($params{$someparam})) { ... }

You can also add more (unique) params to the list:

$params{$newparam} = 1;

And later get a list of (unique) params back:

@badparams = keys %params;
Reiko answered 18/5, 2010 at 19:5 Comment(4)
For the record, this code still does iterate through the array. The map{} call simply makes that iteration very easy to type.Cavanaugh
I'd only do this if your values in @badparams are pseudo-static and you plan to do a lot of checks against the map. I would not recommend this for a single check.Iams
Won't this go bang for an array with multiple items with the same value?Creel
@RobWells no, it will work fine. The next time it see's the same value, it will just overwrite the entry in the hash, which in this case sets it to 1 again.Conjuration
I
260

Best general purpose - Especially short arrays (1000 items or less) and coders that are unsure of what optimizations best suit their needs.

# $value can be any regex. be safe
if ( grep( /^$value$/, @array ) ) {
  print "found it";
}

It has been mentioned that grep passes through all values even if the first value in the array matches. This is true, however grep is still extremely fast for most cases. If you're talking about short arrays (less than 1000 items) then most algorithms are going to be pretty fast anyway. If you're talking about very long arrays (1,000,000 items) grep is acceptably quick regardless of whether the item is the first or the middle or last in the array.

Optimization Cases for longer arrays:

If your array is sorted, use a "binary search".

If the same array is repeatedly searched many times, copy it into a hash first and then check the hash. If memory is a concern, then move each item from the array into the hash. More memory efficient but destroys the original array.

If same values are searched repeatedly within the array, lazily build a cache. (as each item is searched, first check if the search result was stored in a persisted hash. if the search result is not found in the hash, then search the array and put the result in the persisted hash so that next time we'll find it in the hash and skip the search).

Note: these optimizations will only be faster when dealing with long arrays. Don't over optimize.

Iams answered 25/6, 2011 at 1:59 Comment(9)
I find this more readable than the hash method. The only time the hash method would seem to make sense would be when you need to check for multiple values.Ssw
Double tilde was introduced in Perl 5.10Minim
@DennisWilliamson ...and in 5.18 it's considered experimantal.Ecker
Avoid smartmatch in production code. It's unstable/experimental pending further notice.Thermionics
I find it also more readable but Do not use says that it is not efficient and checks every element even if it is the first.Anthracosis
Using perl 5.8.8 and it throws error: syntax error at script.pl line 646, near "$series_id ~"Araceli
Do not use if ("value" ~~ @array). ~~ is an experimental feature called Smartmatch. The experiment appears to be deemed a failure and will be removed or modified in a future version of Perl.Achitophel
Using /^$value$/ is problematic if you are not sure about the contents of $value --- if there are regexp meta characters (e.g. just an open parenthesis), then the call will likely fail. Either write it /^\Q$value\E$/, or avoid the regexp completely, using an eq (which is unfortunately slightly less readable): grep { $_ eq $value } ....Printer
In theory, using List::Util::any (available in perls since 5.20, otherwise upgrade it from CPAN) should be faster than using grep (and is equally readable), as it needs only half of the operations in average.Printer
R
209

Simply turn the array into a hash:

my %params = map { $_ => 1 } @badparams;

if(exists($params{$someparam})) { ... }

You can also add more (unique) params to the list:

$params{$newparam} = 1;

And later get a list of (unique) params back:

@badparams = keys %params;
Reiko answered 18/5, 2010 at 19:5 Comment(4)
For the record, this code still does iterate through the array. The map{} call simply makes that iteration very easy to type.Cavanaugh
I'd only do this if your values in @badparams are pseudo-static and you plan to do a lot of checks against the map. I would not recommend this for a single check.Iams
Won't this go bang for an array with multiple items with the same value?Creel
@RobWells no, it will work fine. The next time it see's the same value, it will just overwrite the entry in the hash, which in this case sets it to 1 again.Conjuration
I
145

You can use smartmatch feature in Perl 5.10 as follows:

For literal value lookup doing below will do the trick.

if ( "value" ~~ @array ) 

For scalar lookup, doing below will work as above.

if ($val ~~ @array)

For inline array doing below, will work as above.

if ( $var ~~ ['bar', 'value', 'foo'] ) 

In Perl 5.18 smartmatch is flagged as experimental therefore you need to turn off the warnings by turning on experimental pragma by adding below to your script/module:

use experimental 'smartmatch';

Alternatively if you want to avoid the use of smartmatch - then as Aaron said use:

if ( grep( /^$value$/, @array ) ) {
  #TODO:
}
Inexpensive answered 13/12, 2013 at 15:54 Comment(5)
This is nice but seems to be new to Perl 5.10. Took some time before I figured out why I'm getting syntax errors.Twinscrew
Warning: you might want to avoid this one, since the operator has apparently different behavior in different versions, and has in the meantime been marked as experimental. So unless you have full control over your perl version (and who has that) you should probably avoid it.Continually
I like this explanation about why setting use experimental 'smartmatch' is recommended. As I have control of my perl version (internal system), I use the no warnings 'experimental::smartmatch'; statement.Stretchy
PLEASE Avoid using this if ( "value" ~~ @array ) . It is experimental, and often skips/misses out many matches. It produces random match results on the same data set, when run at multiple times. I almost lost my job when I used it on critical client data :-((Canty
Warning (again): as of Perl 5.42 the ~~ operator and smartmatching has been entirely removed. Do not use it.Confidential
B
52

This blog post discusses the best answers to this question.

As a short summary, if you can install CPAN modules then the most readable solutions are:

any(@ingredients) eq 'flour';

or

@ingredients->contains('flour');

However, a more common idiom is:

any { $_ eq 'flour' } @ingredients

But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the ~~ "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list.

any() will stop as soon as it finds your value.

Check out the blog post for more details.

Bastien answered 22/5, 2013 at 11:37 Comment(3)
any needs use List::Util qw(any);. List::Util is in Core modules.Chucklehead
For the sake of clarification, contains is not either builtins either provided by CPAN, the blog just show how to implement it.Bigeye
First variant gives me: Type of arg 1 to List::Util::any must be block or sub {} (not private array), where I used an @array like: any(@gdc_link_passthrough) eq $arg, with $arg being bound to a string argument.Zucchetto
U
27

Method 1: grep (may careful while value is expected to be a regex).

Try to avoid using grep, if looking at resources.

if ( grep( /^$value$/, @badparams ) ) {
  print "found";
}

Method 2: Linear Search

for (@badparams) {
    if ($_ eq $value) {
       print "found";
       last;
    }
}

Method 3: Use a hash

my %hash = map {$_ => 1} @badparams;
print "found" if (exists $hash{$value});

Method 4: smartmatch

(added in Perl 5.10, marked is experimental in Perl 5.18).

use experimental 'smartmatch';  # for perl 5.18
print "found" if ($value ~~ @badparams);

Method 5: Use the module List::MoreUtils

use List::MoreUtils qw(any);
@badparams = (1,2,3);
$value = 1;
print "found" if any {$_ == $value} @badparams;
Unload answered 9/5, 2017 at 14:55 Comment(0)
S
12

@eakssjo's benchmark is broken - measures creating hashes in loop vs creating regexes in loop. Fixed version (plus I've added List::Util::first and List::MoreUtils::any):

use List::Util qw(first);
use List::MoreUtils qw(any);
use Benchmark;

my @list = ( 1..10_000 );
my $hit = 5_000;
my $hit_regex = qr/^$hit$/; # precompute regex
my %params;
$params{$_} = 1 for @list;  # precompute hash
timethese(
    100_000, {
        'any' => sub {
            die unless ( any { $hit_regex } @list );
        },
        'first' => sub {
            die unless ( first { $hit_regex } @list );
        },
        'grep' => sub {
            die unless ( grep { $hit_regex } @list );
        },
        'hash' => sub {
            die unless ( $params{$hit} );
        },
    });

And result (it's for 100_000 iterations, ten times more than in @eakssjo's answer):

Benchmark: timing 100000 iterations of any, first, grep, hash...
       any:  0 wallclock secs ( 0.67 usr +  0.00 sys =  0.67 CPU) @ 149253.73/s (n=100000)
     first:  1 wallclock secs ( 0.63 usr +  0.01 sys =  0.64 CPU) @ 156250.00/s (n=100000)
      grep: 42 wallclock secs (41.95 usr +  0.08 sys = 42.03 CPU) @ 2379.25/s (n=100000)
      hash:  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU) @ 10000000.00/s (n=100000)
            (warning: too few iterations for a reliable count)
Samothrace answered 18/12, 2012 at 18:48 Comment(2)
If you want to test for multiple elements, then creating the hash in advance saves you time. But if you just want to know whether it contains a single element, then you don't have the hash already. Therefore, creating the hash should be part of the computing time. Even more so for the regular expression: you need a new regexp for each element you look for.Fribourg
@Fribourg True, but if you're only interested in one check, not multiple checks, then obviously it's microoptimization to even wonder about which method is quicker because those microseconds don't matter. If you'd want to redo this check, hash is the way to go, cause cost of creating hash once is small enough to be ignored then. Above benchmarks measure only different ways of testing, not including any setup. Yes, this may be invalid in your use case, but again - if you're doing only single check you should use whatever is most readable to you and your mates.Ecker
G
11

Even though it's convenient to use, it seems like the convert-to-hash solution costs quite a lot of performance, which was an issue for me.

#!/usr/bin/perl
use Benchmark;
my @list;
for (1..10_000) {
    push @list, $_;
}

timethese(10000, {
  'grep'    => sub {
            if ( grep(/^5000$/o, @list) ) {
                # code
            }
        },
  'hash'    => sub {
            my %params = map { $_ => 1 } @list;
            if ( exists($params{5000}) ) {
                # code
            }
        },
});

Output of benchmark test:

Benchmark: timing 10000 iterations of grep, hash...
          grep:  8 wallclock secs ( 7.95 usr +  0.00 sys =  7.95 CPU) @ 1257.86/s (n=10000)
          hash: 50 wallclock secs (49.68 usr +  0.01 sys = 49.69 CPU) @ 201.25/s (n=10000)
Gremlin answered 22/11, 2012 at 16:4 Comment(6)
Using List::Util::first is faster as it stops iterating when it finds a match.Tours
-1 Your benchmark has defects, grep is significantly slower than creating hash and doing lookup, since former is O(n) and latter O(1). Just do the hash creation only once (outside the loop) and precompute regex to measure methods only (see my answer).Ecker
@Xaerxess: In my case I wanted to do one lookup, so I think it's fair to count both hash/regex creation and the lookup/grep. It the task would be to do many lookups, I guess your solution is better.Gremlin
If you want to do only one iteration, the difference is indistinguishable between any of methods you choose, so any benchmark is wrong since it's an evil microoptimization in this case.Ecker
Well of course it takes longer to convert an array to a hash and do a single lookup, than to do a single lookup in that array.Vaudeville
The regex is compiled only once, as it has the flag 'o'.Safeguard
M
3

@files is an existing array

my @new_values =  grep(/^2[\d].[\d][A-za-z]?/,@files);

print join("\n", @new_values);

print "\n";

/^2[\d].[\d][A-za-z]?/ = vaues starting from 2 here you can put any regular expression

Menado answered 28/2, 2018 at 4:47 Comment(0)
L
2

You certainly want a hash here. Place the bad parameters as keys in the hash, then decide whether a particular parameter exists in the hash.

our %bad_params = map { $_ => 1 } qw(badparam1 badparam2 badparam3)

if ($bad_params{$new_param}) {
  print "That is a bad parameter\n";
}

If you are really interested in doing it with an array, look at List::Util or List::MoreUtils

Laski answered 18/5, 2010 at 19:24 Comment(0)
C
1

If you need to know the amount of every element in array besides existing of that element you may use

my %bad_param_lookup;
@bad_param_lookup{ @bad_params } = ( 1 ) x @bad_params;
%bad_param_lookup = map { $_ => $bad_param_lookup{$_}++} @bad_params;

and then for every $i that is in @bad_params, $bad_param_lookup{$i} contains amount of $i in @bad_params

Cassidycassie answered 17/7, 2020 at 14:18 Comment(0)
S
0

There are two ways you can do this. You can use the throw the values into a hash for a lookup table, as suggested by the other posts. ( I'll add just another idiom. )

my %bad_param_lookup;
@bad_param_lookup{ @bad_params } = ( 1 ) x @bad_params;

But if it's data of mostly word characters and not too many meta, you can dump it into a regex alternation:

use English qw<$LIST_SEPARATOR>;

my $regex_str = do { 
    local $LIST_SEPARATOR = '|';
    "(?:@bad_params)";
 };

 # $front_delim and $back_delim being any characters that come before and after. 
 my $regex = qr/$front_delim$regex_str$back_delim/;

This solution would have to be tuned for the types of "bad values" you're looking for. And again, it might be totally inappropriate for certain types of strings, so caveat emptor.

Shellback answered 18/5, 2010 at 20:35 Comment(1)
You can also write @bad_param_lookup{@bad_params} = (), but you'd need to use exists to test membership.Nonrigid
P
0

When smartmatch is available (still marked experimental) please use that (see other answers). When it is not an exact membership test without extra dependencies seems to be missing from the answers thus far, so here it is:

grep { $_ eq $value } @array

If you insist on using regular expressions, the safe way would be

grep /^\Q$value\E$/, @array
Persist answered 17/10, 2023 at 13:53 Comment(0)
L
-1
my @badparams = (1,2,5,7,'a','zzz');

my $badparams = join('|',@badparams);   # '|' or any other character not present in params

foreach my $par (4,5,6,7,'a','z','zzz')
{
    if ($badparams =~ /\b$par\b/)
    {
        print "$par is present\n";
    }
    else
    {
        print "$par is not present\n";
    }
}

You may want to check for numerical leading spaces consistancy

Latrishalatry answered 21/5, 2014 at 18:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.