Processing math: 100%

Saturday, 12 December 2020

Finding all occurrences of a pattern in a given string in linear time using Z Algorithm

Given a string S and a pattern P, find all occurences of P in S. Supposing the length of S is m and that of P is n, we can find the answer using Z Algorithm in linear time.

First, we need to construct a Z array where Z[i] is the length of longest common prefix between S and the suffix starting from S[i]. If Z[i] = 0, it means that S[0] != S[i] and the first element of Z is generally not defined.

vector<int> z_function(string s) {
    int n = (int) s.length();
    vector<int> z(n);
    for (int i = 1, l = 0, r = 0; i < n; ++i) {
        if (i <= r)
            z[i] = min (r - i + 1, z[i - l]);
        while (i + z[i] < n && s[z[i]] == s[i + z[i]])
            ++z[i];
        if (i + z[i] - 1 > r)
            l = i, r = i + z[i] - 1;
    }
    return z;
}

For this standard string match problem, we can use Z Algorithm to solve it in O(m + n) on the string P + (something won't be matched in both string) + S. If there is an indice i with Z[i] = n, then we find one occurence.

Example:

S = zabcabdabc P = abc m = 10 n = 3

Let K = P + (something won't be matched in both string) + S

K = P + +S=abczabcabdabc

Z would be {0, 0, 0, 0, 3, 0, 0, 2, 0, 0, 3, 0, 0}

There are 2 indices i with Z[i] = n. Hence, there are 2 occurences of a pattern P in S.

Here's the code implemented in C++.

void solve() {
    string p, s; cin >> p >> s;
    string k = p + "$" + s;
    vector<int> z = z_function(k);
    int n = p.size(), m = k.size(), cnt = 0;
    REP(i, m) if(z[i] == n) cnt++;
    OUT(cnt);
}

No comments:

Post a Comment

A Fun Problem - Math

# Problem Statement JATC's math teacher always gives the class some interesting math problems so that they don't get bored. Today t...