Two strings as XYZ and XZY

Asked 22/6, 2015 at 18:51 Answered 22/6, 2015 at 21:33

I have two strings, both the same length. And I've to check if them can be expressed as XYZ and XZY, where Y and Z are not empty.

My solution is to 'eat' the same first letters of both strings and then find Longest Common Substring for rest. Then check if rest of first and rest of second string (without LCS) are equal. Problem is, that I heard of O(N) memory complexity algorithm for this, but all that I found is O(MN). I have limited memory, so it's important for me.

Second solution is to use "(.*)(.+)(.+)\1\3\2" regex, but this is very poor solution.

Anyone have other ideas?

Handlebar answered 22/6, 2015 at 18:51 Comment(13)

How about for example, ZXY? – Erik 22/6, 2015 at 19:3

No, no ZXY. E. g. "kacper", "perkac" – Handlebar 22/6, 2015 at 19:4

what if the string is of odd length, like "kaciper", would it be "perikac"? – Erik 22/6, 2015 at 19:9

Y and Z don't have to be the same length - "kaciper" and "perikac" no, but "kaciper" and "perkaci" yes. – Handlebar 22/6, 2015 at 19:12

With plain brute force you need a single buffer for each YZ permutation - that is 'O(N) memory complexity' – Thenceforward 22/6, 2015 at 19:21

"kaciper" and "perkaci" should be XYZ and XZY. Don't get it. What is X? What is Y? What is Z? – Zoba 22/6, 2015 at 19:21

substrings. Or parts of string, how did u call it. – Handlebar 22/6, 2015 at 19:24

Eating the longest common prefix doesn't work. abcbd / abdbc – Crelin 22/6, 2015 at 19:24

@DieterLücking, that's why I'm asking for better ideas – Handlebar 22/6, 2015 at 19:25

Still brute force: if you have no match, shorten the prefix and try to match YZ and ZY (the memory complexity does not change) – Thenceforward 22/6, 2015 at 19:31

Should abcabc / abcabc be a match? – Leprosy 22/6, 2015 at 19:43

@nonono: (forgive the nonstandard notation) The two substrings (sans X) S and T have proper prefix/suffix Y and Z iff len(S) == len(T) and S ⊂ TT (because TT == ZYZY). Now, apply any linear string search algorithm, e.g. KMP. Here is a related question. – Petrinapetrine 22/6, 2015 at 20:29

@CássioRenan: From the question, X can be empty, Y and Z must not be empty, and Y and Z can be equal, so the answer is yes. – Petrinapetrine 24/6, 2015 at 9:48

Something like this perhaps:

bool test(string& s1, string& s2)
{
    if (s1.size() != s2.size())
    {
        return false;
    }
    for (size_t i = 1; i < s1.size()-2; i++)
    {
        if (s1.substr(0,i) != s2.substr(0,i)) return false;

        for (size_t j = 1; j < s1.size()-i; j++)
        {
            string x1 =s1.substr(i,j);
            string x2 =s1.substr(i+j);
            string y1 = s2.substr(i, s1.size()-i - j);
            string y2 = s2.substr(s1.size()-j);
            if ((x1==y2) && (x2==y1))
            {
                cout << "X: " << s1.substr(0,i) << endl;
                cout << "Y Z: " << x1 << " " << x2 << endl;
                cout << "Z Y: "<< y1 << " " << y2 << endl << endl;
                return true;
            }
        }

    }
    return false;
}

int main()
{
    string s1 {"abcbd"};
    string s2 {"abdbc"};

    if (test(s1, s2))
    {
        cout << "OK" << endl;
    }
    else
    {
        cout << "NOT OK" << endl;
    }
    return 0;
}

If memory is a real big issue for you, you can avoid the substrings by comparing char by char. For instance:

        if (s1.substr(0,i) != s2.substr(0,i)) return false;

could be replaced by

        for (size_t z = 0; z < i; z++)
        {
            if (s1[z] != s2[z]) return false;
        }

Likewise you can avoid the strings x1, x2, y1 and y2 by introducing two similar for-loops where you compare char-by-char instead.

The code will be more difficult to read and understand but it will use less memory.

bool test(string& s1, string& s2)
{
    size_t N = s1.size();
    if (N != s2.size())
    {
        // s1 and s2 must have same length
        return false;
    }

    // i is length of X
    for (size_t i = 1; i < s1.size()-2; i++)
    {
        // Check that X of length i exists in both s1 and s2
        for (size_t k = 0; k < i; k++)
        {
             if (s1[k] != s2[k]) return false;
        }

        // Now check for YZ in s1[i..N-1] and ZY in s2[i..N-1]
        //
        // j is length of Y
        // N-i-j is length of Z
        for (size_t j = 1; j < N-i; j++)
        {
            // Is s1[i..i+j-1] == s2[N-j..N-1]?
            bool b1 = true;
            for (size_t k = 0; k < j; k++)
            {
                if (s1[i+k] != s2[N-j+k]) b1=false;
            }

            // Is s1[i+j..N-1] == s2[i..N-j-1]?
            bool b2 = true;
            for (size_t k = 0; k < (N-i-j); k++)
            {
                if (s1[i+j+k] != s2[i+k]) b2=false;
            }

            if (b1 && b2)
            {
                // Success!

                // For debug the value of i and j
                // can be used to output the substrings
                // using for-loops
                cout << "X=";
                for (size_t k = 0; k < i; k++)
                {
                    cout << s1[k];
                }
                cout << endl;

                cout << "Y=";
                for (size_t k = i; k < (i+j); k++)
                {
                    cout << s1[k];
                }
                cout << endl;

                cout << "Z=";
                for (size_t k = (i+j); k < N; k++)
                {
                    cout << s1[k];
                }
                cout << endl;


                return true;
            }
        }
    }
    return false;
}

int main()
{
    string s1 {"abcbd"};
    string s2 {"abdbc"};

    if (test(s1, s2))
    {
        cout << "OK" << endl;
    }
    else
    {
        cout << "NOT OK" << endl;
    }
    return 0;
}

Zoba answered 22/6, 2015 at 20:21 Comment(6)

Nice work. And already implemented? I'm positively suprised. However, it requires non-empty X, but this can be easily bypassed by inserting e. g. 'a' on begin of both strings. – Handlebar 23/6, 2015 at 17:26

And your optimization - it doesn't work (always false). No matter, i'll deal with it. – Handlebar 23/6, 2015 at 17:28

@Handlebar - I'll look at the optimization and get back to you. X will be minimum one char because the first for-loop starts from 1. – Zoba 23/6, 2015 at 17:33

@Handlebar - I made a stupid mistake. The line if (s1[i+k] != s2[N-j]) b1=false; was miss a +k It should be if (s1[i+k] != s2[N-j+k]) b1=false; I'll fix the answer. – Zoba 23/6, 2015 at 17:44

Thanks a second time. – Handlebar 23/6, 2015 at 17:53

@Handlebar - BTW, performance can be improved by changing b1=false; and b2=false; with continue; and delete the if (b1 && b2) and also delete bool b1 = false; and bool b2 = false;. This code is also O(1) memory. – Zoba 23/6, 2015 at 18:6

First, stop copying stuff:

template<class T>
struct array_view {
  T* b = 0;
  T* e = 0;
  T* begin()const{return b;}
  T* end()const{return e;}

  // defaults:
  array_view()=default;
  array_view(array_view const&)=default;
  array_view& operator=(array_view const&)=default;
  ~array_view()=default;

  array_view( T* s, size_t n ):array_view(s, s+n){}
  array_view( T* s, T* f ):b(s),e(f){}

  using mutable_T = typename std::remove_const<T>::type;

  template<size_t N>
  array_view( T(&arr)[N] ):array_view(arr, N){}
  template<size_t N>
  array_view( std::array<T,N>&arr ):array_view(arr.data(), N){}
  template<size_t N>
  array_view( std::array<mutable_T,N>const&arr ):array_view(arr.data(), N){}

  // similar for std::vector:
  template<class...Ts>
  array_view( std::basic_string<mutable_T, Ts...> const& src):
    array_view(src.data(), src.size())
  {}
  template<class...Ts>
  array_view( std::basic_string<T, Ts...>& src):
    array_view(
      src.empty()?
        array_view():
        array_view(std::addressof(src[0]),src.size())
    )
  {}

  T& back() const { return *std::prev(end()); }
  T& front() const { return *begin(); }
  size_t size() const { return end()-begin(); }
  bool empty() const { return begin()==end(); }

  // slicing functions:
  array_view front( size_t n ) const {
    if (size() <= n) return *this;
    return {begin(), n};
  }
  array_view back( size_t n ) const {
    if (size() <= n) return *this;
    return {end()-n, n};
  }
  array_view trim_front( size_t n ) const {
    return back( size()-n );
  }
  array_view trim_back( size_t n ) const {
    return front( size()-n );
  }
  array_view sub( size_t start, size_t len ) const {
    if (start >= size()) return {};
    len = (std::min)( size()-start, len );
    return {begin()+start, len};
  }

  // comparisons:
  friend bool operator==( array_view lhs, array_view rhs ) {
    if (lhs.size()!=rhs.size()) return false;
    return std::equal( lhs.begin(), lhs.end(), rhs.begin() );
  }
  friend bool operator!=( array_view lhs, array_view rhs ) {
    return !(lhs==rhs);
  }
  friend bool operator<( array_view lhs, array_view rhs ) {
    return std::lexicographical_compare(
      lhs.begin(), lhs.end(),
      rhs.begin(), rhs.end()
    );
  }
  friend bool operator>( array_view lhs, array_view rhs ) {
    return rhs<lhs;
  }
  friend bool operator<=( array_view lhs, array_view rhs ) {
    return !(lhs>rhs);
  }
  friend bool operator>=( array_view lhs, array_view rhs ) {
    return !(lhs<rhs);
  }
};

an array_view is a range without owning. It doesn't support char traits, but I don't care much.

using string_view = array_view<const char>;

size_t common_prefix( string_view lhs, string_view rhs ) {
  auto itl = lhs.begin();
  auto itr = rhs.begin();
  while (itl != lhs.end() && itr != rhs.end()) {
    if (*itl != *itr) break;
    ++itl; ++itr;
  }
  return itl-lhs.begin();
}

gives us the longest common prefix of lhs and rhs.

Now all we have to do is recognize YZ vs ZY fast and efficiently.

bool is_yz_zy( string_view lhs, string_view rhs ) {
  if (lhs.size() < 2) return false;
  if (lhs.size() != rhs.size()) return false;
  for (size_t i = 1; i < lhs.size(); ++i) {
    if (lhs.front(i)==rhs.back(i)) {
      if (lhs.trim_front(i) == rhs.trim_back(i)) {
        return true;
      }
    }
  }
  return false;
}

and stitch:

bool is_xyz_xzy( string_view lhs, string_view rhs ) {
  size_t max_x = common_prefix(lhs, rhs);
  for (size_t i = 0; i <= max_x; ++i) {
    if (is_yz_zy( lhs.trim_front(i), rhs.trim_front(i) ))
      return true;
  }
  return false;
}

uses O(1) memory.

live example

optimization time.

Do an xor scan. Now the only lengths of x that are possible are those such that the xor scans are equal at that index, and the xor scan of the entire strings are equal.

Similarly for yz zy detection, the left xor scan at index i must equal the right xor scan at index length-i xor the right xor scan at length for i to be the length of y.

A stronger hash with still friendly properties will make pathological cases less obvious, but the xor scan should help plenty.

An xor scan is the xor of all previous characters. This can be done in-place within the strings, replacing each char with the xor of itself and all previous chars. The operation is easily inverted, and takes linear time.

String comparison takes a bit of care, but you just xor each scan entry with the previous scan to get the original char.

Here is an implementation of the xor-optimized version. Experimentally it does ~5 n^2 character operations, but it will have n^3 cases.

Davies answered 22/6, 2015 at 21:33 Comment(6)

Cool, but it still uses O(N^3) time in pathological cases. I guess OP doesn't care about that. – Crelin 22/6, 2015 at 23:8

Also nice, but as @Crelin mentioned, Eating the longest common prefix doesn't work. abcbd / abdbc – Handlebar 23/6, 2015 at 17:35

@Handlebar The above code does not eat the longest prefix. It does find the longest prefix, but it eats prefixes in length from 1 to the length of the longest prefix. – Davies 23/6, 2015 at 17:36

@Handlebar Typos fixed, and now includes running example on exactly those two strings. – Davies 23/6, 2015 at 17:41

Your solution doesn't work on, for instance, "kkkeeettt" / "kkkttteee". – Petrinapetrine 24/6, 2015 at 9:36

@mich Ah, did a < instead of <= in max_x loop -- off by 1 error. – Davies 24/6, 2015 at 12:45

Not verified, food for thought.

Reverse the second string and attach to the first, e.g. XYZY'Z'X'. (or the other way around)
Find the longest possible X == X' (some algo discussion may needed here), where size(X) <= len(XYZY'Z'X')/2
Iterate from size(X) to 0. In each iteration, remove X from both ends so the string is YZY'Z', and verify that YZ == Y'Z' by partition the string in the middle, and return True if verified.
After the iteration, return false.

Permissible answered 22/6, 2015 at 20:17 Comment(3)

How is Y'Z'X' the reverse of XYZ? – Unintentional 22/6, 2015 at 20:45

YZX is reverse of XZY (the second string). – Permissible 22/6, 2015 at 21:36

Oh, right, just as you said in step 1. Sorry. Steps 2 and 3 still look suspicious, though. In step 2, you look for X = X', but from step 1, X' is the reverse of X. In step 3, you check for YZ=Y'Z', but from step 1, Y'Z' is the reverse of ZY. The reverse of the string represented by ZY is not the string represented by YZ. Consider Y=001, Z=002, YZ=001002, ZY=002001, Y'Z'=100200, Y'Z' != YZ. – Unintentional 23/6, 2015 at 2:48

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags