Algorithms - Count all pairs of equal numbers in a sorted array in O(n)?
Asked Answered
C

6

6

A question that has me speculating is the following:

Let's say we have a sorted array with the numbers {1,1,1,1,2,2,4,4,4}.

Now, given that we can clearly see that we have six pairs on 1's, one pair of 2's and three pairs of 4's (10 pairs). How would you construct an algorithm that finds these pairs in O(n)?

I have an algorithm that counts the pairs in an array and does so like this:

Arrays.sort(array);
int counter = 0; 
for(int i = 0; i<array.length; i++) {
     for(int j = i+1; j<array.length; j++) {
          if(j!=i && array[j] == array[i]) {
	counter++;
	}
      }
}
return counter; 

But this runs in O(N^2) and I guess (being the novice that I am with algorithms) that there's a better solution to this to get the same results using only one for-loop or multiple while-loops?

I'd like to hear your thoughts!

Costanzo answered 26/9, 2017 at 12:2 Comment(6)
"clearly see that we have six pairs on 1's" <- You lost me right there.Leatherwood
With the 6 pairs of 1 you mention, do you mean 4 choose 2 = 6? How are there 2 pairs of 2? Please explain in more detail.Attachment
clearly you say...Kierstenkieselguhr
To clarify - lets say we have ABCD. Then the different combinations of the characters are AB, AC, AD, BC, BD, CD. Now if you do the same for the 1's - you'll see what I mean :)Osmunda
Note that the original post was slightly different compared to the current, edited version of the post. With "six pairs of ones" the OP indeed means n choose 2. It is now changed to "occurrence of numbers", but that changes the problem slightly.Darkish
@MCEmperor, thanks! Rolled back to the last author's editStarinsky
S
4

You can do it in O(N):

int pairs = 0; 
int times = 1;
for (int i = 1; i < array.length; ++i) {
    if (array[i] == array[i-1]) {
        ++times;
    } else {
        pairs += times*(times-1) / 2;
        times = 1;
    }                   
}
pairs += times*(times-1) / 2;
return pairs;       

Runnable: https://ideone.com/mu65ie

For every distinct number, count the number of its occurrences times. The number of different pairs is equal to number of choices C(times, 2) = times*(times-1) / 2.

Starinsky answered 26/9, 2017 at 12:8 Comment(1)
Sorry Dale, commented on the wrong code! :) Your solution is the one that works fine to find the 1's ! Understand your confusion mate!Osmunda
M
2

Okay so here's also my solution:

 int i = 0;
 int pairs = 0;

 while (i < array.length - 1) {
    if(array[i] == array[i + 1]) {
        pairs += 1;
        i += 2;
    } else {
        i++;
    }
 }

When a pair is found the index is incremented by two, this makes traversal a little bit faster. But the complexity is O(n) anyway.

Of course you run this after the array is sorted.

Muzzle answered 26/9, 2017 at 12:16 Comment(1)
How is this O(N) ?Hastie
P
2

The secret is to stop reiterating. Cache data as it appears. You can use caching to reduce this into an O(nlogn) problem.

Pairs is very vague wording, so in the future a few more examples will clarify things you don't know a name to find answers for. You can use mathematics of Combinations to reduce the problem into O(n).

The wikipedia article is a bit obtuse to read, but the equation you're looking for is near the top:

n! / (k! * (n - k)!)

where ! indicates a factorial number, n indicates the amount of items to be combined (4 1s), and k indicates the amount of items per combination (2 for a pair). So substituting these values:

4! = 24
2! = 2
(4-2)! = 2
4!/(2!2!) = 24/4 = 6

Using this equation it can be reduced to O(n). Since factorials are used and the dataset is sorted, you can further improve performance by caching the result of the factorial call for future calls. Sorted input for a factorial function will have cache hits for nearly every lookup!

Caching may not be necessary if you are using python 3 as it uses a much more efficient algorithm to calculate factorials than python 2. Caching will reduce redundancy, however which may yield a good result on very large values.

An example of caching (memoization):

import math

class Cache(object):
    memos = dict()

    def factorial(self, n):
         if not n in self.memos:
             self.memos[n] = math.factorial(n)
         return self.memos[n]
Peachy answered 26/9, 2017 at 12:51 Comment(0)
H
1

How about:

    Arrays.sort(array);
    int counter = 0; 
    for(int i = 1; i<array.length; i++) {
        if(array[i] == array[i-1]) {
            counter++;
            ++i;
        }
    }
    return counter;
Hydr answered 26/9, 2017 at 12:20 Comment(0)
P
1

This is my way of doing this. Hope it will help somebody :)

static int foo(int[] ar) {      
    int count = 0;
    Arrays.sort(ar);
    for(int i = 0; i<ar.length-1;i++)
    {
            if(ar[i] == ar[i+1])
            {
                count ++;
                i++;
            }
    }
    return count;

}
Plainclothesman answered 25/7, 2019 at 17:45 Comment(0)
A
0

Another way of doing it in O(n). You can count all the pairs as you scan along the array.

int countPairs(vector<int> arr) {
  int count = 0;
  int i = 0;
  while (i < arr.size()) {
    int j = i + 1;
    while (j < arr.size() && arr[j] == arr[i]) {
      count += j - i;
      j++;
    }
    i = j;
  }

  return count;
}

Runnable at - https://ideone.com/TFoZdv

Altorelievo answered 2/7 at 17:35 Comment(3)
This is just a copy of other answers, except it uses a vectorAfterdamp
Hmm.. I think Aris and Mauri Perry's code are returning unique number of pairs. Whereas mine is returning all possible one as the OP asked. This is same as the one Schidu luca posted, just not using any multiplication.Altorelievo
Maurice's code - ideone.com/ZD0DVu Aris's code - ideone.com/BD0oyjAltorelievo

© 2022 - 2024 — McMap. All rights reserved.