Finding the Number of Times an Expression Occurs in a String Continuously and Non Continuously
Asked Answered
E

7

9

I had a coding interview over the phone and was asked this question:

Given a String (for example):

"aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc"

and an expression (for example):

"a+b+c-"

where:

+: means the char before it is repeated 2 times

-: means the char before it is repeated 4 times

Find the number of times the given expression appears in the string with the operands occurring non continuously and continuously.

The above expression occurs 4 times:

1) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
        ^^       ^^       ^^^^                    
        aa       bb       cccc
2) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
        ^^       ^^                               ^^^^
        aa       bb                               cccc

3) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
        ^^                                ^^      ^^^^
        aa                                bb      cccc

4) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
                                       ^^ ^^      ^^^^
                                       aa bb      cccc

I had no idea how to do it. I started doing an iterative brute force method with lots of marking of indices but realized how messy and hard that would to code half way through:

import java.util.*;

public class Main {

    public static int count(String expression, String input) {
        int count = 0;
        ArrayList<char[]> list = new ArrayList<char[]>();

        // Create an ArrayList of chars to iterate through the expression and match to string
        for(int i = 1; i<expression.length(); i=i+2) {
            StringBuilder exp = new StringBuilder();
            char curr = expression.charAt(i-1);
            if(expression.charAt(i) == '+') {
                exp.append(curr).append(curr);
                list.add(exp.toString().toCharArray());
            }
            else { // character is '-'
                exp.append(curr).append(curr).append(curr).append(curr);
                list.add(exp.toString().toCharArray());
            }
        }

        char[] inputArray = input.toCharArray();
        int i = 0; // outside pointer
        int j = 0; // inside pointer
        while(i <= inputArray.length) {
            while(j <= inputArray.length) {
                for(int k = 0; k< list.size(); k++) {
                    /* loop through 
                     * all possible combinations in array list
                     * with multiple loops
                     */
                }
                j++;
            }
            i++;
            j=i;
        }
        return count;
    }

    public static void main(String[] args) {
        String expression = "a+b+c-";
        String input = "aaksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
        System.out.println("The expression occurs: "+count(expression, input)+" times");
    }
}

After spending a lot of time doing it iteratively he mentioned recursion and I still couldn't see a clear way doing it recursively and I wasn't able to solve the question. I am trying to solve it now post-interview and am still not sure how to go about this question. How should I go about solving this problem? Is the solution obvious? I thought this was a really hard question for a coding phone interview.

Electrophoresis answered 29/4, 2015 at 15:47 Comment(8)
I too think of bruteforcing it. But I'll try to do it in my way..Windjammer
is it a hard question? or is it just me?Electrophoresis
@Kingsley no it is not hard. Or you'd like working and tested code?Bonucci
@Sasha Salauyou I thought it was hard for an interview question, I guess I just have to practice more!Electrophoresis
@Kingsley yep, just practice. If you don't understand something in my answer, please ask. BTW, what was time limit for this?Bonucci
@Sasha Salauyou I had 45 minutes. I spent most of the time trying to think and code it iteratively then after I failed he mentioned recursion and I spent about 5 minutes at the end trying to think of a way to do it recursively but couldn't come up with a solution. I am trying to code your algorithm now (and waiting to see if anyone else chimes in with ideas).Electrophoresis
If one haven't done something this before 45 min may be good enough to give A solution but to really solve problem with optimized code this is not enough time.Eponymous
Seems like a confusing question to be asked over the phone.Yazbak
B
4

Non-recursion algorithm that requires O(m) space and operates in O(n*m), where m is number of tokens in query:

@Test
public void subequences() {

    String input = "aabbccaacccccbbd";
    String query = "a+b+";

    // here to store tokens of a query: e.g. {a, +}, {b, +}
    char[][] q = new char[query.length() / 2][];

    // here to store counts of subsequences ending by j-th token found so far
    int[] c =  new int[query.length() / 2];   // main
    int[] cc = new int[query.length() / 2];   // aux        

    // tokenize
    for (int i = 0; i < query.length(); i += 2)
        q[i / 2] = new char[] {query.charAt(i), query.charAt(i + 1)};

    // init
    char[] sub2 = {0, 0};        // accumulator capturing last 2 chars
    char[] sub4 = {0, 0, 0, 0};  // accumulator capturing last 4 chars

    // main loop
    for (int i = 0; i < input.length(); i++) {

        shift(sub2, input.charAt(i));
        shift(sub4, input.charAt(i));

        boolean all2 = sub2[1] != 0 && sub2[0] == sub2[1];  // true if all sub2 chars are same
        boolean all4 = sub4[3] != 0 && sub4[0] == sub4[1]   // true if all sub4 chars are same
              && sub4[0] == sub4[2] && sub4[0] == sub4[3];

        // iterate tokens
        for (int j = 0; j < c.length; j++) {

            if (all2 && q[j][1] == '+' && q[j][0] == sub2[0]) // found match for "+" token
                cc[j] = j == 0             // filling up aux array
                      ? c[j] + 1           // first token, increment counter by 1
                      : c[j] + c[j - 1];   // add value of preceding token counter

            if (all4 && q[j][1] == '-' && q[j][0] == sub4[0]) // found match for "-" token
                cc[j] = j == 0 
                      ? c[j] + 1 
                      : c[j] + c[j - 1];
        }
        if (all2) sub2[1] = 0;  // clear, to make "aa" occur in "aaaa" 2, not 3 times
        if (all4) sub4[3] = 0;
        copy(cc, c);            // copy aux array to main 
        }
    }
    System.out.println(c[c.length - 1]);
}


// shifts array 1 char left and puts c at the end
void shift(char[] cc, char c) {
    for (int i = 1; i < cc.length; i++)
        cc[i - 1] = cc[i];
    cc[cc.length - 1] = c;
}

// copies array contents 
void copy(int[] from, int[] to) {
    for (int i = 0; i < from.length; i++)
        to[i] = from[i];
}

The main idea is to catch chars from the input one by one, holding them in 2- and 4-char accumulators and check if any of them match some tokens of the query, remembering how many matches have we got for sub-queries ending by these tokens so far.

Query (a+b+c-) is splitted into tokens (a+, b+, c-). Then we collect chars in accumulators and check if they match some tokens. If we find match for first token, we increment its counter by 1. If we find match for another j-th token, we can create as many additional subsequences matching subquery composed of tokens [0...j], as many of them now exist for subquery composed of tokens [0... j-1], because this match can be appended to every of them.

For example, we have:

a+ : 3  (3 matches for a+)
b+ : 2  (2 matches for a+b+)
c- : 1  (1 match for a+b+c-) 

when cccc arrives. Then c- counter should be increased by b+ counter value, because so far we have 2 a+b+ subsequences and cccc can be appended to both of them.

Bonucci answered 29/4, 2015 at 19:43 Comment(2)
Looks good to me! +1. I would actually still count aa as appearing 3 times in aaaa, but the question isn't clear about what is preferred.Gersham
Great. Small fix might be needed for exp a+a+ for string aa.Eponymous
G
3

Let's call the length of the string n, and the length of the query expression (in terms of the number of "units", like a+ or b-) m.

It's not clear exactly what you mean by "continuously" and "non-continuously", but if "continuously" means that there can't be any gaps between query string units, then you can just use the KMP algorithm to find all instances in O(m+n) time.

We can solve the "non-continuous" version in O(nm) time and space with dynamic programming. Basically, what we want to compute is a function:

f(i, j) = the number of occurrences of the subquery consisting of the first i units
          of the query expression, in the first j characters of the string.

So with your example, f(2, 41) = 2, since there are 2 separate occurrences of the subpattern a+b+ in the first 41 characters of your example string.

The final answer will then be f(n, m).

We can compute this recursively as follows:

f(0, j) = 0
f(i, 0) = 0
f(i > 0, j > 0) = f(i, j-1) + isMatch(i, j) * f(i-1, j-len(i))

where len(i) is the length of the ith unit in the expression (always 2 or 4) and isMatch(i, j) is a function that returns 1 if the ith unit in the expression matches the text ending at position j, and 0 otherwise. For example, isMatch(15, 2) = 1 in your example, because s[14..15] = bb. This function takes just constant time to run, because it never needs to check more than 4 characters.

The above recursion will already work as-is, but we can save time by making sure that we only solve each subproblem once. Because the function f() depends only on its 2 parameters i and j, which range between 0 and m, and between 0 and n, respectively, we can just compute all n*m possible answers and store them in a table.

[EDIT: As Sasha Salauyou points out, the space requirement can in fact be reduced to O(m). We never need to access values of f(i, k) with k < j-1, so instead of storing m columns in the table we can just store 2, and alternate between them by always accessing column m % 2.]

Gersham answered 29/4, 2015 at 18:49 Comment(3)
@SashaSalauyou: Good point. I've edited to explain how to do this.Gersham
Please check out my non-recursive solution: https://mcmap.net/q/1206342/-finding-the-number-of-times-an-expression-occurs-in-a-string-continuously-and-non-continuouslyBonucci
@SashaSalauyou: Please upvote the non-recursive solution I posted before you posted yours :)Gersham
T
1

Wanted to try it for myself and figured I could then share my solution as well. The parse method obviously has issues when there is indeed a char 0 in the expression (although that would probably be the bigger issue itself), the find method will fail for an empty needles array and I wasn't sure if ab+c- should be considered a valid pattern (I treat it as such). Note that this covers only the non-continous part so far.

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Matcher {

  public static void main(String[] args) {
    String haystack = "aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
    String[] needles = parse("a+b+c-");
    System.out.println("Needles: " + Arrays.toString(needles));
    System.out.println("Found: " + find(haystack, needles, 0));
    needles = parse("ab+c-");
    System.out.println("Needles: " + Arrays.toString(needles));
    System.out.println("Found: " + find(haystack, needles, 0));
  }

  private static int find(String haystack, String[] needles, int i) {
    String currentNeedle = needles[i];
    int pos = haystack.indexOf(currentNeedle);
    if (pos < 0) {
      // Abort: Current needle not found
      return 0;
    }
    // Current needle found (also means that pos + currentNeedle.length() will always
    // be <= haystack.length()
    String remainingHaystack = haystack.substring(pos + currentNeedle.length());
    // Last needle?
    if (i == needles.length - 1) {
      // +1: We found one match for all needles
      // Try to find more matches of current needle in remaining haystack
      return 1 + find(remainingHaystack, needles, i);
    }
    // Try to find more matches of current needle in remaining haystack
    // Try to find next needle in remaining haystack
    return find(remainingHaystack, needles, i) + find(remainingHaystack, needles, i + 1);
  }

  private static String[] parse(String expression) {
    List<String> searchTokens = new ArrayList<String>();
    char lastChar = 0;
    for (int i = 0; i < expression.length(); i++) {
      char c = expression.charAt(i);
      char[] chars;
      switch (c) {
        case '+':
          // last char is repeated 2 times
          chars = new char[2];
          Arrays.fill(chars, lastChar);
          searchTokens.add(String.valueOf(chars));
          lastChar = 0;
          break;
        case '-':
          // last char is repeated 4 times
          chars = new char[4];
          Arrays.fill(chars, lastChar);
          searchTokens.add(String.valueOf(chars));
          lastChar = 0;
          break;
        default:
          if (lastChar != 0) {
            searchTokens.add(String.valueOf(lastChar));
          }
          lastChar = c;
      }
    }
    return searchTokens.toArray(new String[searchTokens.size()]);
  }
}

Output:

Needles: [aa, bb, cccc]
Found: 4
Needles: [a, bb, cccc]
Found: 18
Thilde answered 29/4, 2015 at 19:9 Comment(0)
B
0

Recursion may be the following (pseudocode):

int search(String s, String expression) {

     if expression consists of only one token t /* e. g. "a+" */ {
         search for t in s
         return number of occurrences
     } else {
         int result = 0
         divide expression into first token t and rest expression
         // e. g. "a+a+b-" -> t = "a+", rest = "a+b-"
         search for t in s
         for each occurrence {
             s1 = substring of s from the position of occurrence to the end
             result += search(s1, rest) // search for rest of expression in rest of string
         }
         return result
     }
}   

Applying this to entire string, you'll get number of non-continuous occurrences. To get continuous occurrences, you don't need recursion at all--just transform expression into string and search by iteration.

Bonucci answered 29/4, 2015 at 16:3 Comment(7)
You would be doing lots of duplicate searching at this step. search for t in s as you have done same search with sibling node.Eponymous
@Eponymous no, why? Suppose input = aabbaabb, exp = aa, bb. Here I've got 3 occurrences. Now. Call on initial string: search("aabbaabb", "aa, bb"). It searches for "aa", finds first "aa", then calls search("bbaabb", "bb") (which returns 2, because "bb" is 1-token expression which occurs in "bbaabb" twice). result = 0 + 2 = 2. Then it finds second "aa" and calls search("bb", "bb"), which returns 1. result = 2 + 1 = 3. End.Bonucci
@Eponymous I call not the whole expression and whole string. In recursive call I pass rest of expression and rest of a string. You're doing the same by searchFromIndex and currentTokenIndexBonucci
you doing substring of passed string in method and doing search all occurrence of a token, this is duplicate search. In your example above you did last bb search 2 times.Eponymous
@Eponymous in the example OP provided last "cccc" is found 3 times and participates in 3 occurrences out of 4. Don't you call it "duplicate"?Bonucci
by duplicate I mean doing same processing again, in the example you given search for bb was already done in first iteration of aa and same searching is being done for second iteration of aa as well.Eponymous
@Eponymous I believe there is some another algorithm which operates in O(N^2) or O(n*p) instead of O(N^p), but I'm sure it is not recursive.Bonucci
C
0

How about preprocessing aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc?

This become a1k1s1d1b1a2l1a1s1k1d1h1f1b2l1a1j1d1f1h1a1c4a1o1u1d1g1a1l1s1a2b2l1i1s1d1f1h1c4

Now find occurrences of a2, b2, c4.

Candlepower answered 29/4, 2015 at 16:47 Comment(2)
"aaaa" should match "a+a+" or not?Bonucci
Would be better if you preprocess search token a+b+c- into aa, bb, cccc and now search for those token in input string.Eponymous
E
0

Tried it code below but right now it gives only first possible match based of depth first.

Need to be changed to do all possible combination instead of just first

import java.util.ArrayList;
import java.util.List;

public class Parsing {
    public static void main(String[] args) {
        String input = "aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
        System.out.println(input);

        for (int i = 0; i < input.length(); i++) {
            System.out.print(i/10);
        }
        System.out.println();

        for (int i = 0; i < input.length(); i++) {
            System.out.print(i%10);
        }
        System.out.println();

        List<String> tokenisedSearch = parseExp("a+b+c-");
        System.out.println(tokenisedSearch);

        parse(input, 0, tokenisedSearch, 0);
    }

    public static boolean parse(String input, int searchFromIndex, List<String> tokensToSeach, int currentTokenIndex) {
        if(currentTokenIndex >= tokensToSeach.size())
            return true;
        String token = tokensToSeach.get(currentTokenIndex);
        int found = input.indexOf(token, searchFromIndex);
        if(found >= 0) {
            System.out.println("Found at Index "+found+ " Token " +token);
            return parse(input, searchFromIndex+1, tokensToSeach, currentTokenIndex+1);
        }
        return false;
    }

    public static List<String> parseExp(String exp) {
        List<String> list = new ArrayList<String>();
        String runningToken = "";
        for (int i = 0; i < exp.length(); i++) {
            char at = exp.charAt(i);
            switch (at) { 
            case '+' :
                runningToken += runningToken;
                list.add(runningToken);
                runningToken = "";
                break;
            case '-' :
                runningToken += runningToken;
                runningToken += runningToken;
                list.add(runningToken);
                runningToken = "";
                break;
            default :
                runningToken += at;
            }
        }
        return list;
    }
}
Eponymous answered 29/4, 2015 at 17:5 Comment(0)
C
0

If you convert the search string first with a simple parser/compiler so a+ becomes aa etc. then you can simply take this string and run a regular expression match against your hay stack. (Sorry, I'm no Java coder so can't deliver any real code but it is not really difficult)

Collegiate answered 9/5, 2015 at 15:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.