In Perl, how can I generate all possible combinations of a list?
Asked Answered
P

7

20

I have a file with a list, and a need to make a file that compares each line to the other. for example, my file has this:

AAA  
BBB  
CCC  
DDD  
EEE

I would like the final list to look like this:

AAA  BBB  
AAA  CCC  
AAA  DDD  
AAA  EEE  
BBB  CCC  
BBB  DDD  
BBB  EEE  
CCC  DDD  
CCC  EEE  
DDD  EEE

I am trying to do this in Perl, for this first time and am having a little trouble. I do know that you need to make an array, and then split it, but after that I am having some trouble.

Projection answered 24/4, 2012 at 14:24 Comment(0)
J
30

Use Algorithm::Combinatorics. The iterator based approach is preferable to generating everything at once.

#!/usr/bin/env perl

use strict; use warnings;
use Algorithm::Combinatorics qw(combinations);

my $strings = [qw(AAA BBB CCC DDD EEE)];

my $iter = combinations($strings, 2);

while (my $c = $iter->next) {
    print "@$c\n";
}

Output:

AAA BBB
AAA CCC
AAA DDD
AAA EEE
BBB CCC
BBB DDD
BBB EEE
CCC DDD
CCC EEE
DDD EEE
Juliannajulianne answered 24/4, 2012 at 14:48 Comment(0)
B
10

It is straightforward to write this using recursion.

This code example demonstrates.

use strict;
use warnings;

my $strings = [qw(AAA BBB CCC DDD EEE)];

sub combine;

print "@$_\n" for combine $strings, 5;

sub combine {

  my ($list, $n) = @_;
  die "Insufficient list members" if $n > @$list;

  return map [$_], @$list if $n <= 1;

  my @comb;

  for my $i (0 .. $#$list) {
    my @rest = @$list;
    my $val  = splice @rest, $i, 1;
    push @comb, [$val, @$_] for combine \@rest, $n-1;
  }

  return @comb;
}

Edit

My apologies - I was generating permutations instead of combinations.

This code is correct.

use strict;
use warnings;

my $strings = [qw(AAA BBB CCC DDD EEE)];

sub combine;

print "@$_\n" for combine $strings, 2;

sub combine {

  my ($list, $n) = @_;
  die "Insufficient list members" if $n > @$list;

  return map [$_], @$list if $n <= 1;

  my @comb;

  for (my $i = 0; $i+$n <= @$list; ++$i) {
    my $val  = $list->[$i];
    my @rest = @$list[$i+1..$#$list];
    push @comb, [$val, @$_] for combine \@rest, $n-1;
  }

  return @comb;
}

output

AAA BBB
AAA CCC
AAA DDD
AAA EEE
BBB CCC
BBB DDD
BBB EEE
CCC DDD
CCC EEE
DDD EEE
Bedsore answered 24/4, 2012 at 16:20 Comment(0)
S
7

Take a look at Math::Combinatorics - Perform combinations and permutations on lists

example copying from the CPAN:

use Math::Combinatorics;

  my @n = qw(a b c);
  my $combinat = Math::Combinatorics->new(count => 2,
                                          data => [@n],
                                         );

  print "combinations of 2 from: ".join(" ",@n)."\n";
  print "------------------------".("--" x scalar(@n))."\n";
  while(my @combo = $combinat->next_combination){
    print join(' ', @combo)."\n";
  }

  print "\n";

  print "permutations of 3 from: ".join(" ",@n)."\n";
  print "------------------------".("--" x scalar(@n))."\n";
  while(my @permu = $combinat->next_permutation){
    print join(' ', @permu)."\n";
  }

  output:
combinations of 2 from: a b c
  ------------------------------
  a b
  a c
  b c

  permutations of 3 from: a b c
  ------------------------------
  a b c
  a c b
  b a c
  b c a
  c a b
  c b a
Sacerdotalism answered 24/4, 2012 at 14:33 Comment(1)
Why don't you use the example data from the question?Merited
B
2

Here's a hack using glob:

my @list = qw(AAA BBB CCC DDD EEE);

for my $i (0..$#list-1) {
    print join "\n", glob sprintf "{'$list[$i] '}{%s}",
          join ",", @list[$i+1..$#list];
    print "\n";
}

The output:

AAA BBB
AAA CCC
AAA DDD
AAA EEE
BBB CCC
BBB DDD
BBB EEE
CCC DDD
CCC EEE
DDD EEE

P.S. you may want to use Text::Glob::Expand or String::Glob::Permute modules instead of plain glob() to avoid the caveat of matching files in the current working directory.

Broody answered 24/4, 2012 at 15:8 Comment(1)
The glob trick should always be accompanied by the various caveats for when it fails.Merited
C
2

I benchmarked the following Perl modules:

  1. Math::Combinatorics
  2. Algorithm::Combinatorics
  3. Cmb

Benchmark consisted of doing what the OP asked, combinations of 2 items, but ramping the set of words up to 10,000 instead of just the original 5 requested (AAA BBB CCC DDD EEE).

Test script for Math::Combinatorics

#!/usr/bin/env perl
use strict; use warnings;
use Math::Combinatorics;
my $strings = [qw(AAA BBB CCC DDD EEE) x 2000];
my $iter = new Math::Combinatorics (count => 2, data => $strings);
while (my @c = $iter->next_combination) {
    print "@c\n";
}

This produced ~53,479 combinations per-second.

Test script for Algorithm::Combinatorics

#!/usr/bin/env perl
use strict; use warnings;
use Algorithm::Combinatorics qw(combinations);
my $strings = [qw(AAA BBB CCC DDD EEE) x 2000];
my $iter = combinations($strings, 2);
while (my $c = $iter->next) {
    print "@$c\n";
}

This produced ~861,982 combinations per-second.

Test script for Cmb

#!/usr/bin/env perl
use strict; use warnings;
use Cmb;
my $strings = [qw(AAA BBB CCC DDD EEE) x 2000];
my $cmb = new Cmb { size_min => 2, size_max => 2 };
$cmb->cmb_callback($#$strings + 1, $strings, sub {
    print "@_\n";
    return 0;
});

This produced ~2,940,882 combinations per-second.

But if you just need to print the combinations, Cmb can actually do that even faster than the above.

#!/usr/bin/env perl
use strict; use warnings;
use Cmb;
my $strings = [qw(AAA BBB CCC DDD EEE) x 2000];
my $cmb = new Cmb { size_min => 2, size_max => 2 };
$cmb->cmb($#$strings + 1, $strings);

This produced ~3,333,000 combinations per-second.

Benchmarks were performed using dpv on CentOS Linux release 7.7.1908 (Core) under kernel 3.10.0-1062.1.1.el7.x86_64 x86_64 using Perl 5.16.3 on an Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

Chiclayo answered 6/12, 2019 at 3:52 Comment(0)
T
0

How about:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dump qw(dump);

my @in = qw(AAA BBB CCC DDD EEE);
my @list;
while(my $first = shift @in) {
    last unless @in;
    my $rest = join',',@in;
    push @list, glob("{$first}{$rest}");
}
dump @list;

output:

(
  "AAABBB",
  "AAACCC",
  "AAADDD",
  "AAAEEE",
  "BBBCCC",
  "BBBDDD",
  "BBBEEE",
  "CCCDDD",
  "CCCEEE",
  "DDDEEE",
)
Temuco answered 24/4, 2012 at 14:34 Comment(5)
The glob trick should always be accompanied by the various caveats for when it fails.Merited
@daxim: Do you mean the "side-effect" of matching files in the current working directory? If so, isn't this perfectly safe since he's not using ?, [] or *?Profiteer
All of that. I am annoyed now, the caveats should be clearly laid out as part of the answer, not rhetoric questions attached as a comment with low visibility. It's not a "side-effect", it really happens, modalising the word is wrong. It's not safe: obviously the user provided made-up/anonymised data in the question and will be in for a bad surprise under real world conditions. SO answers should strive to not set people up for failure, they should be always aware of subtleties and risks; given that, I have now downvoted this answer to give M42 an incentive to improve it. -- continued:Merited
I recommend Text::Glob::Expand or String::Glob::Permute over plain glob, if only that the documentation is better and they do the manipulation on in-memory data structures, not influenced by external factors like the shell or what's in the current directory.Merited
@daxim: That's a good point. Quoting side-effect was meant as a joke though: I agree that SO isn't the best place to be teaching these kinds of tricks.Profiteer
S
-1
  1. take first string
  2. iterate over array from next position to end
    1. attach next string to original string
  3. take next string and go back to step 2
Skimmer answered 24/4, 2012 at 14:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.