Computing set intersection in linear time?

Asked 9/1, 2011 at 22:11 Answered 25/2, 2014 at 17:58

algorithm math data-structures big-o set-intersection

Is there an algorithm that, given two sets, computes their intersection in linear time?

I can run two for loops to check all pairs of elements, recording elements that I find in both of the sets. However, the runninng time will be O(n²). How do I do this in O(n) time?

Wain answered 9/1, 2011 at 22:11 Comment(5)

WHY would it ever be n^2? Isn't the "obvious" solution in O(n) and we should be trying to find a better one? – Unerring 21/7, 2017 at 6:41

@Unerring To do better than O(n), you must not be visiting every element of a set. How could you possibly determine anything about a set element without having looked at that element? – Industrialize 3/12, 2019 at 8:44

Earth to @pete, as explicitly noted in the question, iterating over each pair of elements as a naive solution is O(n^2). – Baumbaugh 17/6, 2022 at 8:17

@Kröw naive solution is to use HashSet lookups which are O(1), so total would be O(n) – Unerring 17/6, 2022 at 20:51

@Unerring That is not naive at all. Please read the question in full. – Baumbaugh 17/6, 2022 at 22:27

That depends on your set implementation.

If you have a hash set (O(1) lookup), then the approach indicated by all the other posters is correct. Iterate across all the elements in the first set. If it's in the second set, then add it to the result. This runs in O(n) time.

If you have a tree set (O(lg n) lookup), then this approach will work, but it runs in O(n lg n) time. You can do better; there's an O(n) solution. I assume that you have some sort of iterator that can traverse the elements of the two sets in ascending order. If you do, then the question is "given two lists in sorted order, find their intersection." This can be done using a modified version of the algorithm you use to merge two ranges. The idea is to keep track of the two iterators. At each step, compare the first elements of the ranges. If they're equal, add the element to the intersection and advance both iterators forward. If the first is less than the second, then advance the first iterator. If the first element is greater, then advance the second iterator. This runs in time O(n) because each iteration consumes at least one element, and there's only O(n) elements in total.

Janae answered 9/1, 2011 at 22:18 Comment(4)

The two sets are not sorted.What If I use merge sort technique and then follow your method of comparing the first elements of the ranges,but im not sure it will work in O(n) bcoz merge sort takes O(nlogn)time – Wain 9/1, 2011 at 22:27

If you put the sets in sorted order in O(n lg n) time, this technique should work out, but it won't run in linear time because of the sort overhead. How are your sets represented? – Janae 9/1, 2011 at 22:37

Just iterate over both lists, adding them to a hash table, then apply the "intersect two hash tables" algorithm. – Janae 9/1, 2011 at 22:55

I think i got it nw.Thank you – Wain 9/1, 2011 at 23:0

I wonder nobody mentioned hashtable.
Regardless of your set implementation (even if 'set' here means a simple array), you can

put contents of the first set into hashtable and
iterate over second set, checking if hashtable contains current element.

O(n)

Lolita answered 9/1, 2011 at 22:55 Comment(1)

The only thing I don't like about this solution is that it doesn't scale to more than two sets. – Bouchard 17/10, 2019 at 13:0

intersection(a, b):
  result = new empty set
  for x in b:
    if a contains x:
      add x to result

  return result

If the contains test is constant time (such as in a set that uses a hash table as an implementation), then this algorithm is O(n).

Shanley answered 9/1, 2011 at 22:14 Comment(2)

If a and b are the same sets then algorithm takes O(n^3) time since search in "b" and adding an element to "result" takes O(n) time. Adding an element to array based set takes O(n) time since sets cannot have duplicated element. The duplication check is done on each addition. If result and b is based on hashtable then even if you cannot be sure that search and addition takes O(1) time, it can take O(n) as well. – Ventage 19/5, 2016 at 11:59

@Ventage even if the sets are equal, it won't take O(n^3) but O(3n) = O(n) steps, assuming that contains and insert are constant (=O(1)). – Cryptonym 8/3, 2022 at 16:6

Combine the two arrays and count the no of occurrences of each element in this combined array and put these in a new array. Then check this count array for entries which contain 2, those elements are in intersection of the two sets.

Exiguous answered 25/2, 2014 at 17:58 Comment(0)

For all elements in set 1: Check if that element is in set 2. You can implement a Set that has amortized O(1) lookup time.

Hanuman answered 9/1, 2011 at 22:13 Comment(1)

Alternative option if you're using a sorted-tree based set instead of a hash-based one: Iterate over both sets in sorted order, advancing to the next element of the currently "lowest" set. O(n) in the total number of elements. – Hanuman 9/1, 2011 at 22:17

if one of the two lists is ordered, then we can start with the unordered list

FUNCTION: INTERSECTION ( LIST A, LIST B )
{
   CREATE C AS EMPTY LIST

   FOR EVERY: NUMBER n IN A
   {
        IF BINARY-SEARCH(n) IN B
        {
            ADD n TO C
        }
   }

   RETURN C
}

Time Complexity = O(n O(BINARY-SEARCH)) = O(n log n)

if list B is hashed, then we've BIG-THETA(C n + T(hash))

where BIG-THETA is the asymptotic average, and C is a constant and T(hash) is time needed for the hash function

Mccall answered 15/6, 2012 at 13:13 Comment(0)

Recommended topics

Hot tags