Implementing an algorithm to determine if a string has all unique characters [closed]

Asked 28/6, 2013 at 4:47 Answered 7/10, 2017 at 19:6

Context: I'm a CS n00b working my way through "Cracking the Coding Interview." The first problem asks to "implement an algorithm to determine if a string has all unique characters." My (likely naive) implementation is as follows:

def isUniqueChars2(string):
  uchars = []
  for c in string:
    if c in uchars:
      return False
    else:
      uchars.append(c)
  return True

The author suggests the following implementation:

def isUniqueChars(string):
  checker = 0
  for c in string:
    val = ord(c) - ord('a')
    if (checker & (1 << val) > 0):
      return False
    else:
      checker |= (1 << val)
  return True

What makes the author's implementation better than mine (FWIW, the author's solution was in Java and I converted it to Python -- is my solution one that is not possible to implement in Java)? Or, more generally, what is desirable in a solution to this problem? What is wrong with the approach I've taken? I'm assuming there are some fundamental CS concepts (that I'm not familiar with) that are important and help inform the choice of which approach to take to this problem.

Knoxville answered 28/6, 2013 at 4:47 Comment(8)

Don't name variables dict, especially if the variable in question isn't even a dict! – Scroop 28/6, 2013 at 4:51

yours does a linear search because your dict is not a real dict – Moulding 28/6, 2013 at 4:51

There is absolutely nothing wrong with your solutions, it will work, it's a python way to solve this problem. – Panicstricken 28/6, 2013 at 4:51

Your solution works fine and is Pythonic. It would be quicker if you used a set rather than a list, for O(1) lookup. It's slower than the author's code, but far more Pythonic (presumably why the author's code wasn't written in Python...) – Scroop 28/6, 2013 at 4:54

I see you made an edit to rename dict to uchars. May I suggest you revert this change, since several comments refer to this mistake, and several replies have already addressed your actual question? Changing the question underneath just makes things confusing to new visitors. – Donets 28/6, 2013 at 4:57

Edited: changed dict to uchars. I had originally used a dictionary instead of a list. – Knoxville 28/6, 2013 at 4:58

The author's solution subtly assumes 100% lowercase characters. uppercase, space and other characters will break it. This makes your solution much better. – Artificial 28/6, 2013 at 5:29

More solutions here: #5278622 – Excrescence 28/6, 2013 at 9:16

Here is how I would write this:

def unique(s):
    return len(set(s)) == len(s)

Strings are iterable so you can pass your argument directly to set() to get a set of the characters from the string (which by definition will not contain any duplicates). If the length of that set is the same as the length of the original string then you have entirely unique characters.

Your current approach is fine and in my opinion it is much more Pythonic and readable than the version proposed by the author, but you should change uchars to be a set instead of a list. Sets have O(1) membership test so c in uchars will be considerably faster on average if uchars is a set rather than a list. So your code could be written as follows:

def unique(s):
    uchars = set()
    for c in s:
        if c in uchars:
            return False
        uchars.add(c)
    return True

This will actually be more efficient than my version if the string is large and there are duplicates early, because it will short-circuit (exit as soon as the first duplicate is found).

Trifocal answered 28/6, 2013 at 4:54 Comment(5)

I had thought the exact same thing, except mine was a lambda. Well done. – Elainaelaine 28/6, 2013 at 4:56

That was indeed pythonic ... – Artimas 28/6, 2013 at 5:5

Thank you for your response @F.J! Can you explain what you mean by "sets have O(1) membership test?" – Knoxville 28/6, 2013 at 5:17

This is called "big O notation", it is a way to represent the time complexity of an algorithm as the input size changes. So for example O(n) is linear, as the input size grows the run time will grow at a constant rate. O(1) is constant time, so "sets have O(1) membership test" means that even as the size of the set grows, it will always take a fixed time to see if an element exists in the set. – Trifocal 28/6, 2013 at 5:35

if c in uchars is also a O(1) test. – Corvin 17/10, 2016 at 18:3

Beautiful is better than ugly.

Your approach is perfectly fine. This is python, when there are a bajillion ways to do something. (Yours is more beautiful too :)). But if you really want it to be more pythonic and/or make it go faster, you could use a set, as F.J's answer has described.

The second solution just looks really hard to follow and understand.

(PS, dict is a built-in type. Don't override it :p. And string is a module from the standard library.)

Margetts answered 28/6, 2013 at 4:54 Comment(5)

True but it is definitely possible to write a beautiful solution to this problem that runs faster than his – Moulding 28/6, 2013 at 4:56

@Moulding I was actually comparing his to the second one – Margetts 28/6, 2013 at 4:57

I know the second one is faster and uglier, I am saying that you can achieve elegance and speed in this case – Moulding 28/6, 2013 at 4:58

@Moulding Yes, of course :)! – Margetts 28/6, 2013 at 4:58

besides I think that the bit shifting solution is beautiful in a way – Moulding 28/6, 2013 at 5:8

str = raw_input("Enter string: ")


def isUnique():
    test_dict = dict()
    is_unique = True
    for c in str:
        if(not test_dict.get(c, False)):
            test_dict[c] = c
        else:
            is_unique = False
            break
    if is_unique:
        return "String is unique"
    else:
        return "String is not unique"

print(isUnique())

Melantha answered 22/9, 2016 at 10:19 Comment(0)

Your solution is not incorrect but your variable dict is not actually a dictionary which means it has to do a linear search to check for the character. The solution from the book does the check in constant time. I will say that the other solution is obnoxiously unreadable because it uses setting the bits in a number to check if the char is unique or not

Moulding answered 28/6, 2013 at 4:54 Comment(0)

The solution you have translated from Java into Python is what's called a 'bit-twiddling' algorithm. The idea is that an integer can be treated in multiple ways: One, as a number. Two, as a collection of bits (32 off/ons, or 64, or what-have-you). The algorithm bit-twiddles by saying each bit represents the presence or absense of a specific character - if the nth bit is 0, it sets it. If it's 1, the character that bit corresponds to already exists, so we know there are no unique characters.

However, unless you need the efficiency, avoid bit-twiddling algorithms, as they're not as self-evident in how they work as non-bit-twiddles.

Gonorrhea answered 28/6, 2013 at 4:56 Comment(0)

your implementation takes O(n2), author takes O(n). In your implementation, " if c in uchars:" , when it check if c in this array, it will have to go through the whole array, it takes time. So yours is not better than author's...

Acrefoot answered 6/8, 2013 at 16:4 Comment(0)

Solution 1:

def is_unique(string):
  if len(string) > 128:
    return False

  unique_tracker = [False] * 128
  for char in string:
    if unique_tracker[ord(char)] == False:
      unique_tracker[ord(char)] = True
    else:
      return False
  return True

Solution 2:

def is_unique_bit(string):
  if len(string) > 128:
  return False

  unique_tracker = 0
  for char in string:
    ascii_val = ord(char)
    if (unique_tracker & (1 << ascii_val)) > 0:
      return False
    unique_tracker |= (1 << ascii_val)
  return True

Ake answered 18/12, 2016 at 0:51 Comment(0)

The original question is along the lines of: Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures?

Focus on the second sentence, It says we cannot use additional data structures, i.e you need to consider the space complexity of your solution. Your solution uses an array and therefore is not meeting the questions criteria.

Kahlil answered 7/10, 2017 at 19:6 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Beautiful is better than ugly.

Solution 1:

Solution 2:

Recommended topics

Hot tags