Implementing an algorithm to determine if a string has all unique characters [closed]
Asked Answered
K

8

12

Context: I'm a CS n00b working my way through "Cracking the Coding Interview." The first problem asks to "implement an algorithm to determine if a string has all unique characters." My (likely naive) implementation is as follows:

def isUniqueChars2(string):
  uchars = []
  for c in string:
    if c in uchars:
      return False
    else:
      uchars.append(c)
  return True

The author suggests the following implementation:

def isUniqueChars(string):
  checker = 0
  for c in string:
    val = ord(c) - ord('a')
    if (checker & (1 << val) > 0):
      return False
    else:
      checker |= (1 << val)
  return True

What makes the author's implementation better than mine (FWIW, the author's solution was in Java and I converted it to Python -- is my solution one that is not possible to implement in Java)? Or, more generally, what is desirable in a solution to this problem? What is wrong with the approach I've taken? I'm assuming there are some fundamental CS concepts (that I'm not familiar with) that are important and help inform the choice of which approach to take to this problem.

Knoxville answered 28/6, 2013 at 4:47 Comment(8)
Don't name variables dict, especially if the variable in question isn't even a dict!Scroop
yours does a linear search because your dict is not a real dictMoulding
There is absolutely nothing wrong with your solutions, it will work, it's a python way to solve this problem.Panicstricken
Your solution works fine and is Pythonic. It would be quicker if you used a set rather than a list, for O(1) lookup. It's slower than the author's code, but far more Pythonic (presumably why the author's code wasn't written in Python...)Scroop
I see you made an edit to rename dict to uchars. May I suggest you revert this change, since several comments refer to this mistake, and several replies have already addressed your actual question? Changing the question underneath just makes things confusing to new visitors.Donets
Edited: changed dict to uchars. I had originally used a dictionary instead of a list.Knoxville
The author's solution subtly assumes 100% lowercase characters. uppercase, space and other characters will break it. This makes your solution much better.Artificial
More solutions here: #5278622Excrescence
T
46

Here is how I would write this:

def unique(s):
    return len(set(s)) == len(s)

Strings are iterable so you can pass your argument directly to set() to get a set of the characters from the string (which by definition will not contain any duplicates). If the length of that set is the same as the length of the original string then you have entirely unique characters.

Your current approach is fine and in my opinion it is much more Pythonic and readable than the version proposed by the author, but you should change uchars to be a set instead of a list. Sets have O(1) membership test so c in uchars will be considerably faster on average if uchars is a set rather than a list. So your code could be written as follows:

def unique(s):
    uchars = set()
    for c in s:
        if c in uchars:
            return False
        uchars.add(c)
    return True

This will actually be more efficient than my version if the string is large and there are duplicates early, because it will short-circuit (exit as soon as the first duplicate is found).

Trifocal answered 28/6, 2013 at 4:54 Comment(5)
I had thought the exact same thing, except mine was a lambda. Well done.Elainaelaine
That was indeed pythonic ...Artimas
Thank you for your response @F.J! Can you explain what you mean by "sets have O(1) membership test?"Knoxville
This is called "big O notation", it is a way to represent the time complexity of an algorithm as the input size changes. So for example O(n) is linear, as the input size grows the run time will grow at a constant rate. O(1) is constant time, so "sets have O(1) membership test" means that even as the size of the set grows, it will always take a fixed time to see if an element exists in the set.Trifocal
if c in uchars is also a O(1) test.Corvin
M
5

Beautiful is better than ugly.

Your approach is perfectly fine. This is python, when there are a bajillion ways to do something. (Yours is more beautiful too :)). But if you really want it to be more pythonic and/or make it go faster, you could use a set, as F.J's answer has described.

The second solution just looks really hard to follow and understand.

(PS, dict is a built-in type. Don't override it :p. And string is a module from the standard library.)

Margetts answered 28/6, 2013 at 4:54 Comment(5)
True but it is definitely possible to write a beautiful solution to this problem that runs faster than hisMoulding
@Moulding I was actually comparing his to the second oneMargetts
I know the second one is faster and uglier, I am saying that you can achieve elegance and speed in this caseMoulding
@Moulding Yes, of course :)!Margetts
besides I think that the bit shifting solution is beautiful in a wayMoulding
M
2
str = raw_input("Enter string: ")


def isUnique():
    test_dict = dict()
    is_unique = True
    for c in str:
        if(not test_dict.get(c, False)):
            test_dict[c] = c
        else:
            is_unique = False
            break
    if is_unique:
        return "String is unique"
    else:
        return "String is not unique"

print(isUnique())
Melantha answered 22/9, 2016 at 10:19 Comment(0)
M
1

Your solution is not incorrect but your variable dict is not actually a dictionary which means it has to do a linear search to check for the character. The solution from the book does the check in constant time. I will say that the other solution is obnoxiously unreadable because it uses setting the bits in a number to check if the char is unique or not

Moulding answered 28/6, 2013 at 4:54 Comment(0)
G
1

The solution you have translated from Java into Python is what's called a 'bit-twiddling' algorithm. The idea is that an integer can be treated in multiple ways: One, as a number. Two, as a collection of bits (32 off/ons, or 64, or what-have-you). The algorithm bit-twiddles by saying each bit represents the presence or absense of a specific character - if the nth bit is 0, it sets it. If it's 1, the character that bit corresponds to already exists, so we know there are no unique characters.

However, unless you need the efficiency, avoid bit-twiddling algorithms, as they're not as self-evident in how they work as non-bit-twiddles.

Gonorrhea answered 28/6, 2013 at 4:56 Comment(0)
A
0

your implementation takes O(n2), author takes O(n). In your implementation, " if c in uchars:" , when it check if c in this array, it will have to go through the whole array, it takes time. So yours is not better than author's...

Acrefoot answered 6/8, 2013 at 16:4 Comment(0)
A
0

Solution 1:

def is_unique(string):
  if len(string) > 128:
    return False

  unique_tracker = [False] * 128
  for char in string:
    if unique_tracker[ord(char)] == False:
      unique_tracker[ord(char)] = True
    else:
      return False
  return True

Solution 2:

def is_unique_bit(string):
  if len(string) > 128:
  return False

  unique_tracker = 0
  for char in string:
    ascii_val = ord(char)
    if (unique_tracker & (1 << ascii_val)) > 0:
      return False
    unique_tracker |= (1 << ascii_val)
  return True
Ake answered 18/12, 2016 at 0:51 Comment(0)
K
0

The original question is along the lines of: Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures?

Focus on the second sentence, It says we cannot use additional data structures, i.e you need to consider the space complexity of your solution. Your solution uses an array and therefore is not meeting the questions criteria.

Kahlil answered 7/10, 2017 at 19:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.