Determining a string has all unique characters without using additional data structures and without the lowercase characters assumption
Asked Answered
S

7

9

This is one of the questions in the Cracking the Coding Interview book by Gayle Laakmann McDowell:

Implement an algorithm to determine if a string has all unique characters. What if you can not use additional data structures?

The author wrote:

We can reduce our space usage a little bit by using a bit vector. We will assume, in the below code, that the string is only lower case 'a' through 'z'. This will allow us to use just a single int.

The author has this implementation:

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0)
            return false;
        checker |= (1 << val);
    }
    return true;
}

Let's say we get rid of the assumption that "the string is only lower case 'a' through 'z'". Instead, the string can contain any kind of character—like ASCII characters or Unicode characters.

Is there a solution as efficient as the author's (or a solution that comes close to being as efficient as the author's)?

Related questions:

Senatorial answered 11/1, 2014 at 2:25 Comment(4)
Well you immediately need a different storage mechanism, because a-z already takes up 26 of the 32 bits in checker.Cymbiform
though probably not as fast as an int, a BitSet will be handy for larger data setsCalefacient
@Calefacient a BitSet probably violates the condition you can't use additional data structures. But then again, since the author is using an int as a set of 26 bits, you could argue that the int is really being used as a data structure and thus she's violated her own condition.Gerrard
@Gerrard Good point! I guess coming up a solution to the author's proposed problem is impossible without using some kind of data structure (like the int bit vector she used).Senatorial
C
6

for the asccii character set you can represent the 256bits in 4 longs: you basically hand code an array.

public static boolean isUniqueChars(String str) {
    long checker1 = 0;
    long checker2 = 0;
    long checker3 = 0;
    long checker4 = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i);
        int toCheck = val / 64;
        val %= 64;
        switch (toCheck) {
            case 0:
                if ((checker1 & (1L << val)) > 0) {
                    return false;
                }
                checker1 |= (1L << val);
                break;
            case 1:
                if ((checker2 & (1L << val)) > 0) {
                    return false;
                }
                checker2 |= (1L << val);
                break;
            case 2:
                if ((checker3 & (1L << val)) > 0) {
                    return false;
                }
                checker3 |= (1L << val);
                break;
            case 3:
                if ((checker4 & (1L << val)) > 0) {
                    return false;
                }
                checker4 |= (1L << val);
                break;
        }            
    }
    return true;
}

You can use the following code to generate the body of a similar method for unicode characters:

static void generate() {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("long checker%d = 0;%n", i));
    }
    sb.append("for (int i = 0; i < str.length(); ++i) {\n"
            + "int val = str.charAt(i);\n"
            + "int toCheck = val / 64;\n"
            + "val %= 64;\n"
            + "switch (toCheck) {\n");
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("case %d:\n"
                + "if ((checker%d & (1L << val)) > 0) {\n"
                + "return false;\n"
                + "}\n"
                + "checker%d |= (1L << val);\n"
                + "break;\n", i, i, i));
    }
    sb.append("}\n"
            + "}\n"
            + "return true;");
    System.out.println(sb);
}
Calefacient answered 11/1, 2014 at 2:54 Comment(1)
stack is an "additional data stricture" tooPedestrianism
E
3

You only need one line... well less than one line actually:

if (str.matches("((.)(?!.*\\1))*"))

this uses a negative look ahead to assert that each character is not repeated later in the string.

This approach a time complexity of O(n^2), because for all n characters in the input, all characters that follow (there are n of those) are compared for equality.

Evaporimeter answered 11/1, 2014 at 3:27 Comment(6)
That is definitely using another data structure. :-)Nimble
@Nimble really? Define "data structure". A single int variable can be considered a data structure. There is no visible one here.Evaporimeter
First, there's an obvious String. Plus, all the regex stuff (Pattern, Matcher etc.) that will be generated behind the scenes. Admittedly, the "no data structures requirement" is pretty bogus, and this is still a good solution, hence the smiley in the original comment.Nimble
Well also this is not as efficient time complexity (in big O notation). This O(N^2) while the author asked for O(N) time algorithms.Afflict
@user3494047- Could you please explain how the time complexity of using regex is O(n^2)? ThanksCounterpoise
@Counterpoise I have added an explanationEvaporimeter
E
2

I think we need a general and practical definition of "additional data structures". Intuitively, we don't want to call every scalar integer or pointer a "data structure", because that makes nonsense of any prohibition of "additional data structures".

I propose we borrow a concept from big-O notation: an "additional data structure" is one that grows with the size of the data set.

In the present case, the code quoted by the OP appears to have a space requirement of O(1) because the bit vector happens to fit into an integer type. But as the OP implies, the general form of the problem is really O(N).

An example of a solution to the general case is to use two pointers and a nested loop to simply compare every character to every other. The space requirement is O(1) but the time requirement is O(N^2).

Eczema answered 11/1, 2014 at 6:23 Comment(0)
W
0

How about the following algorithm?

Steps:

Convert string to lowercase.

Loop through each character in the string

Set variable data = 0

Calculate offset = ascii value of first character in string - 97

Set the flag for that position with mask = 1 << offset

If bitwise AND returns true, then a character is repeating (mask & data), so break here.

else if we have not seen the character repeat yet, set the bit for that character by doing a bitwise OR by doing data = data | mask

Continue till end of characters.

Willena answered 11/1, 2014 at 10:58 Comment(0)
F
0

One-line solution without any extra data structure:

str.chars().distinct().count() == (int)str.length();
Floorer answered 21/4, 2020 at 11:50 Comment(0)
E
0

I am posting this answer as an idea or a suggestion. I am not sure of the truth-ness neither the running time complexity of this solution (but I do think the effective time complexity should not be more than O(n), but I am more than pleased to know if anyone of you wants to explain.


So, the idea goes like this. We use two pointers (I dont think they fall under the category of data-structure because otherwise things would be too hard). One called fast and other called slow. As in their name the fast pointer traverses faster than the slow one (2 indices at a time). We will keep on checking the character at these positions until one of the two things happen:

  1. Fast and slow point to same index (their is no point in comparing their characters now as they will be equal)
  2. s[slow] == s[fast] (as now 2 characters at different indices are equal we can return false)

Now we just reset the fast pointer and make it traverse along the string (one by one, not so fast anymore huh?) until it becomes equal to slow, checking if s[fast] == s[slow] holds true, incase it does return false.


So, the code in C++ is like this:

#include <bits/stdc++.h>
using namespace std;

int main() {
  string s;
  cin >> s;
  int n = s.size();
  int slow = 0, fast = 1;
  bool flag = false;
  //   cout << "Running..." << endl;
  while (slow != fast) {
    if (s[slow] == s[fast]) {
      flag = true;
      break;
    }
    // cout << "slow: " << s[slow] << " fast: " << s[fast] << endl;
    slow = (slow + 1) % n;
    fast = (fast + 2) % n;
  }
  fast = 0;
  while (!flag && fast != slow) {
    if (s[slow] == s[fast]) {
      flag = true;
      break;
    }
    fast = (fast + 1) % n;
  }
  if (flag) {
    cout << "Duplicates present." << endl;
  } else {
    cout << "No duplicates present." << endl;
  }
  return 0;
}
Engagement answered 16/1, 2021 at 14:46 Comment(0)
P
0
import math
def uniqueCharacters(str):
     
    # Assuming string can have characters
    # a-z this has 32 bits set to 0
    checker = 0
     
    for i in range(len(str)):
        
        bitAtIndex = ord(str[i]) - ord('a')
        
        # If that bit is already set in
        # checker, return False
        if ((bitAtIndex) >= 0):
            
            if ((checker & ((1 << bitAtIndex))) > 0):
                print('duplicate character: '+str[i])
                return False
                 
            # Otherwise update and continue by
            # setting that bit in the checker
            checker = checker | (1 << bitAtIndex)
            
 
    # No duplicates encountered, return True
    return True
 
# Driver Code
if __name__ == '__main__':
     
    input_word = input('Enter string: ')
    input_word = input_word.lower()
    if (uniqueCharacters(input_word)):
        print("The String " + input_word +
              " has all unique characters")
    else:
        print("The String " + input_word +
Paediatrics answered 15/4, 2021 at 0:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.