Why is the following two duplicate finder algorithms have different time complexity?
Asked Answered
O

1

2

I was reading this question. The selected answer contains the following two algorithms. I couldn't understand why the first one's time complexity is O(ln(n)). At the worst case, if the array don't contain any duplicates it will loop n times so does the second one. Am I wrong or am I missing something? Thank you

1) A faster (in the limit) way

Here's a hash based approach. You gotta pay for the autoboxing, but it's O(ln(n)) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)

boolean duplicates(final int[] zipcodelist)
{
  Set<Integer> lump = new HashSet<Integer>();
  for (int i : zipcodelist)
  {
    if (lump.contains(i)) return true;
    lump.add(i);
  }
  return false;
}

2)Bow to HuyLe

See HuyLe's answer for a more or less O(n) solution, which I think needs a couple of add'l steps:

static boolean duplicates(final int[] zipcodelist) {    
    final int MAXZIP = 99999;    
    boolean[] bitmap = new boolean[MAXZIP+1];    
    java.util.Arrays.fill(bitmap, false);    

    for (int item : zipcodeList)
        if (!bitmap[item]) bitmap[item] = true;
        else return true;    
    }

    return false; 
}
Ongun answered 23/6, 2012 at 2:14 Comment(5)
I think for the "A faster (in the limit) way", the author typoed and meant to write O(n*log(n)), which is what you'd get from e.g. a set backed by a balanced binary search tree. It's pretty obvious that there can't be a way faster than O(n).Recognizee
I didn't get the log(n) part.Ongun
In a balanced BST, insert and lookup are O(log(size)), so if you do n lookups and inserts, you have O(n*log n) operations. I guess that's where the log comes from.Recognizee
If you see the answer below, the author is saying it is O(n) but you are saying it is O(nlog(n)).I can't decide which one is right. can you see the answer please?Ongun
It depends on the implementation of the set type used. For a set implemented using a balanced BST, it's O(n*log n). For a HashSet, it's O(n) if the HashSet keeps the O(1) insert and lookup promise. (It can degrade to O(n^2) for bad implementations of a HashSet, but I'm optimistic that it won't for java.util.HashSet.) My guess is that the author of the answer to the linked question thought of a tree set first and typo'ed O(n*log n) as O(log n) and later decided to use a HashSet, forgetting to change the complexity. Or originally had the right complexity and miscorrected.Recognizee
V
2

The first solution should have expected complexity of O(n), since the whole zip code list must be traversed, and processing each zip code is O(1) expected time complexity.

Even taking into consideration that insertion into HashMap may trigger a re-hash, the complexity is still O(1). This is a bit of non sequitur, since there may be no relation between Java HashMap and the assumption in the link, but it is there to show that it is possible.

From HashSet documentation:

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.

It's the same for the second solution, which is correctly analyzed: O(n).

(Just an off-topic note, BitSet is faster than array, as seen in the original post, since 8 booleans are packed into 1 byte, which uses less memory).

Villus answered 23/6, 2012 at 2:26 Comment(4)
i.e both have the same time complexity, O(n). Right?Ongun
@WowBow: Time complexity is the same (assuming no severe collision).Villus
They shouldn't be if a Hash is used and implemented properly. This answer is not necessarily correct!Christadelphian
Apologies, I am looking at testing of the WHOLE set, not the singular input. I was looking at it incorrectly as the testing of a set of inputs against another set. +1 Apparently in my tired, just had a baby state, I can still be easily SCHOOLED :-)Christadelphian

© 2022 - 2024 — McMap. All rights reserved.