How to produce cartesian product in bash?
Asked Answered
D

3

14

I want to produce such file (cartesian product of [1-3]X[1-5]):

1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5

I can do this using nested loop like:

for i in $(seq 3) 
do
  for j in $(seq 5)
  do
      echo $i $j
  done
done

is there any solution without loops?

Drabeck answered 29/4, 2014 at 11:11 Comment(0)
J
22

Combine two brace expansions!

$ printf "%s\n" {1..3}" "{1..5}
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5

This works by using a single brace expansion:

$ echo {1..5}
1 2 3 4 5

and then combining with another one:

$ echo {1..5}+{a,b,c}
1+a 1+b 1+c 2+a 2+b 2+c 3+a 3+b 3+c 4+a 4+b 4+c 5+a 5+b 5+c
Jagannath answered 29/4, 2014 at 11:12 Comment(6)
does {1..3} expand by shell?Drabeck
Yes, {1..3} is same as seq 3 or seq 1 3, just that it comes with shell.Jagannath
is there any other using paste?Drabeck
Not that I know, and I don't see the need. You can maybe use echo {1..3}" "{1..5} | xargs -n 2.Jagannath
why does eg echo {1..2}{3..4} produce the full elm cross product 13 14 23 24 , as opposed to 1 23 24 or 13 23 4?Lowminded
For those looking for discrete elements, e.g. printf "%s\n" {a,e,z}" "{x,9,u}Foreordination
C
14

A shorter (but hacky) version of Rubens's answer:

join -j 999999 -o 1.1,2.1 file1 file2

Since the field 999999 most likely does not exist it is considered equal for both sets and therefore join have to do the Cartesian product. It uses O(N+M) memory and produces output at 100..200 Mb/sec on my machine.

I don't like the "shell brace expansion" method like echo {1..100}x{1..100} for large datasets because it uses O(N*M) memory and can when used careless bring your machine to knees. It is hard to stop because ctrl+c does not interrupts brace expansion which is done by the shell itself.

Colburn answered 6/12, 2017 at 13:15 Comment(2)
This is the most generic solution as it works with files and not just a sequence of numbersWitkowski
The -j command is GNU core util only. POSIX needs a field specification for each file... Even shorter still: join -j 999 <(seq 3) <(seq 5) to produce the OP's output.Intermigration
E
10

The best alternative for cartesian product in bash is surely -- as pointed by @fedorqui -- to use parameter expansion. However, in case your input that is not easily producible (i.e., if {1..3} and {1..5} does not suffice), you could simply use join.

For example, if you want to peform the cartesian product of two regular files, say "a.txt" and "b.txt", you could do the following. First, the two files:

$ echo -en {a..c}"\tx\n" | sed 's/^/1\t/' > a.txt
$ cat a.txt
1    a    x
1    b    x
1    c    x

$ echo -en "foo\nbar\n" | sed 's/^/1\t/' > b.txt
$ cat b.txt
1    foo
1    bar

Notice the sed command is used to prepend each line with an identifier. The identifier must be the same for all lines, and for all files, so the join will give you the cartesian product -- instead of putting aside some of the resultant lines. So, the join goes as follows:

$ join -j 1 -t $'\t' a.txt b.txt | cut -d $'\t' -f 2-
a    x    foo
a    x    bar
b    x    foo
b    x    bar
c    x    foo
c    x    bar

After both files are joined, cut is used as an alternative to remove the column of "1"s formerly prepended.

Emogeneemollient answered 29/4, 2014 at 12:28 Comment(3)
what u write as join is really a join (en.wikipedia.org/wiki/…) i do not need join. what i want is cartesian product (en.wikipedia.org/wiki/Cartesian_product).Drabeck
@طاهر Well, when you join each and every row from one table with the rows from another table, i.e., when you do a cross join, your output is a cartesian product.Emogeneemollient
The benefit of this solution is that bash doesn't inherently allow variables to be used in brace expansions. You can use variables in brace expansion with eval, but then you're using eval.Workbook

© 2022 - 2024 — McMap. All rights reserved.