Is there any efficient easy way to compare two lists with the same length with Mathematica?
Asked Answered
B

5

12

Given two lists A={a1,a2,a3,...an} and B={b1,b2,b3,...bn}, I would say A>=B if and only if all ai>=bi.

There is a built-in logical comparison of two lists, A==B, but no A>B. Do we need to compare each element like this

And@@Table[A[[i]]>=B[[i]],{i,n}]

Any better tricks to do this?

EDIT: Great thanks for all of you.

Here's a further question:

How to find the Maximum list (if exist) among N lists?

Any efficient easy way to find the maximum list among N lists with the same length using Mathematica?

Babi answered 16/1, 2012 at 19:52 Comment(0)
F
18

Method 1: I prefer this method.

NonNegative[Min[a - b]]

Method 2: This is just for fun. As Leonid noted, it is given a bit of an unfair advantage for the data I used. If one makes pairwise comparisons, and return False and Break when appropriate, then a loop may be more efficient (although I generally shun loops in mma):

result = True;
n = 1; While[n < 1001, If[a[[n]] < b[[n]], result = False; Break[]]; n++]; result

Some timing comparisons on lists of 10^6 numbers:

a = Table[RandomInteger[100], {10^6}];
b = Table[RandomInteger[100], {10^6}];

(* OP's method *)
And @@ Table[a[[i]] >= b[[i]], {i, 10^6}] // Timing

(* acl's uncompiled method *)
And @@ Thread[a >= b] // Timing

(* Leonid's method *)
lessEqual[a, b] // Timing

(* David's method #1 *)
NonNegative[Min[a - b]] // Timing

timings 2


Edit: I removed the timings for my Method #2, as they can be misleading. And Method #1 is more suitable as a general approach.

Faustina answered 16/1, 2012 at 20:7 Comment(10)
Note that since you generate random lists, your last method (in which there are strange hard-coded constants, but I assume those are remnants from your development code. Anyways, they are making it wrong here, but let's ignore that for a moment) gets an unfair advantage, since statistically the condition violation will happen very early in the loop. You should also test for the lists where the result would be True - in which case top-level lists are guaranteed to be slower. In fact, your last method is a top-level variation of the compiled code of @acl.Mccoy
This of course does not detract from your beautiful and fast first method, which I admire.Mccoy
Well, the last one is much faster because it short-circuits, breaking as soon as any two elements don't match. Try it in the worst case (ie, when they must all be compare, eg a=b), and it's by far the slowest. The fastest is the compiled method cmp, I think.Kendy
By the way, in many cases, a compiled (to C) loop is more efficient than anything else you could do. Uncompiled top-level loops tend to be the slowest overall (unless you are using a much more efficient algorithm, as you are here).Kendy
@Leonid. I also prefer Method #1. Method #2 was to examine the speed advantage of Th breaking early out of the loop. You are quite correct about the list generation: If numbers are randomly assigned to a, b, then Method #2 gets an unfair advantage.Faustina
@Kendy We do need collaborative comments (like in piratePad), don't we :) ?Mccoy
@Kendy I don't know how to run your compiled code but would gladly include it in the comparison if you tell me how. Do I need to have C installed on my machine?Faustina
@David It didn't occur to me that a C compiler may not be available on some systems. If you don't have one, you may replace "C" with "WVM". This, though, is 30 times slower here. When I run all the examples, the compile to C one is by far the fastest, as expected (I'm talking about worst-case scenarios).Kendy
@Kendy Yes, compiled C is very fast.Faustina
Very cool! Great Thanks for all of you. This is a wonderful day.Babi
K
8

For instance,

And @@ Thread[A >= B]

should do the job.

EDIT: On the other hand, this

cmp = Compile[
  {
   {a, _Integer, 1},
   {b, _Integer, 1}
   },
  Module[
   {flag = True},
   Do[
    If[Not[a[[p]] >= b[[p]]], flag = False; Break[]],
    {p, 1, Length@a}];
   flag],
  CompilationTarget \[Rule] "C"
  ]

is 20 times faster. 20 times uglier, too, though.

EDIT 2: Since David does not have a C compiler available, here are all the timing results, with two differences. Firstly, his second method has been fixed to compare all elements. Secondly, I compare a to itself, which is the worst case (otherwise, my second method above will only have to compare elements up to the first to violate the condition).

(*OP's method*)
And @@ Table[a[[i]] >= b[[i]], {i, 10^6}] // Timing

(*acl's uncompiled method*)
And @@ Thread[a >= b] // Timing

(*Leonid's method*)
lessEqual[a, b] // Timing

(*David's method #1*)
NonNegative[Min[a - b]] // Timing

(*David's method #2*)
Timing[result = True;
 n = 1; While[n < Length[a], 
  If[a[[n]] < b[[n]], result = False; Break[]];
  n++]; result]

(*acl's compiled method*)
cmp[a, a] // Timing

enter image description here

So the compiled method is much faster (note that David's second method and the compiled method here are the same algorithm, and the only difference is overhead).

All these are on battery power so there may be some random fluctuations, but I think they are representative.

EDIT 3: If, as ruebenko suggested in a comment, I replace Part with Compile`GetElement, like this

cmp2 = Compile[{{a, _Integer, 1}, {b, _Integer, 1}}, 
  Module[{flag = True}, 
   Do[If[Not[Compile`GetElement[a, p] >= Compile`GetElement[b, p]], 
     flag = False; Break[]], {p, 1, Length@a}];
   flag], CompilationTarget -> "C"]

then cmp2 is a twice as fast as cmp.

Kendy answered 16/1, 2012 at 20:2 Comment(6)
+1. You could probably make this a bit simpler: cmp1 = Compile[{{a, _Integer, 1}, {b, _Integer, 1}}, Do[If[a[[p]] < b[[p]], Return[False]], {p, 1, Length@a}] != False, CompilationTarget -> "C"].Mccoy
Compilation is clearly faster. I'm surprised, nonetheless, that my method #1 did as well as it did.Faustina
@David perhaps it is so fast because DeveloperPackedArrayQ[a - b]` returns TrueKendy
For fun, try Compile`GetElement[a,p] instead of Part[a,p] and the second one.Narghile
My method #2 is definitely not the way to go for large lists when the result is True.Faustina
@David except if compiled, when it's the fastest (it's the same as cmp above). Especially with ruebenko's undocumented trick!Kendy
M
4

Since you mentioned efficiency as a factor in your question, you may find these functions useful:

ClearAll[lessEqual, greaterEqual];
lessEqual[lst1_, lst2_] :=
   SparseArray[1 - UnitStep[lst2 - lst1]]["NonzeroPositions"] === {};

greaterEqual[lst1_, lst2_] :=
   SparseArray[1 - UnitStep[lst1 - lst2]]["NonzeroPositions"] === {};

These functions will be reasonably efficient. The solution of @David is still two-four times faster, and if you want extreme speed and your lists are numerical (made of Integer or Real numbers), you should probably use compilation to C (the solution of @acl and similarly for other operators).

You can use the same techniques (using Unitize instead of UnitStep to implement equal and unequal), to implement other comparison operators (>, <, ==, !=). Keep in mind that UnitStep[0]==1.

Mccoy answered 16/1, 2012 at 20:37 Comment(4)
Your functions are more than 4 times slower than mine, according to my timing results.Faustina
@David I think that depends on the data. In my tests, I had 2 times difference. But yes, they are slower (than your first function at least). I will still keep this answer though, since it illustrates how SparseArray-s can be used, but I agree that your solution is superior.Mccoy
I mentioned the timing ratio based on Method #1 (since Method #2 is really just for fun).Faustina
@David I also meant method#1. I have: largeTst1 = RandomInteger[1000, 10000000];largeTst2 = RandomInteger[{999, 1990}, 10000000];In[95]:= NonNegative[Min[largeTst2-largeTst1]]//Timing Out[95]= {0.125,False}; In[96]:= greaterEqual[largeTst2,largeTst1]//Timing Out[96]= {0.234,False}.Mccoy
G
3

Comparison functions like Greater, GreaterEqual, Equal, Less, LessEqual can be made to apply to lists in a number of ways (they are all variations of the approach in your question).

With two lists:

 a={a1,a2,a3};
 b={b1,b2,b3};

and two instances with numeric entries

na={2,3,4}; nb={1,3,2}; 

you can use

And@@NonNegative[na-nb]

With lists with symoblic entries

And@@NonNegative[na-nb]

gives

NonNegative[a1 - b1] && NonNegative[a2 - b2] && NonNegative[a3 - b3]

For general comparisons, one can create a general comparison function like

listCompare[comp_ (_Greater | _GreaterEqual | _Equal | _Less | _LessEqual), 
         list1_List, list2_List] := And @@ MapThread[comp, {list1, list2}]

Using as

listCompare[GreaterEqual,na,nb]

gives True. With symbolic entries

listCompare[GreaterEqual,a,b]

gives the logially equivalent expression a1 <= b1 && a2 <= b2 && a3 <= b3.

Glassworks answered 16/1, 2012 at 20:49 Comment(0)
V
2

When working with packed arrays and numeric comparator such as >= it would be hard to beat David's Method #1.

However, for more complicated tests that cannot be converted to simple arithmetic another method is required.

A good general method, especially for unpacked lists, is to use Inner:

Inner[test, a, b, And]

This does not make all of the comparisons ahead of time and can therefore be much more efficient in some cases than e.g. And @@ MapThread[test, {a, b}]. This illustrates the difference:

test = (Print[#, " >= ", #2]; # >= #2) &;

{a, b} = {{1, 2, 3, 4, 5}, {1, 3, 3, 4, 5}};

Inner[test, a, b, And]
1 >= 1
2 >= 3

False
And @@ MapThread[test, {a, b}]
1 >= 1
2 >= 3
3 >= 3
4 >= 4
5 >= 5

False

If the arrays are packed and especially if the likelihood that the return is False is high then a loop such as David's Method #2 is a good option. It may be better written:

Null === Do[If[a[[i]] ~test~ b[[i]], , Return@False], {i, Length@a}]
1 >= 1
2 >= 3

False
Vellicate answered 17/1, 2012 at 7:33 Comment(4)
Actually it's not hard at all, there's a method in my answer that is roughly an order of magnitude faster... (good tip on Inner, I did not know about the efficiency advantage, +1)Kendy
@Kendy I don't count compiling to C. ;-) (I know I should probably get over it, but between not having it in v7 and the need to write in that awful procedural style I am just not ready to accept that as Mathematica.) Thanks for the vote.Vellicate
Why do you say "If the arrays are packed..." then a loop is a good choice? If I unpack a and set b=a, the unpacked-list version is a little bit slower than the packed one. ie, packing doesn't really affect Do and company's performance (although they do not seem to unpack). Or am I misunderstanding?Kendy
@Kendy I meant that there is a significant unpacking overhead when using Inner even if the very first pair does not pass test. Using Part in Do avoids this. If you understand what I mean, and can think of a better way to state it, please do.Vellicate

© 2022 - 2024 — McMap. All rights reserved.