Saturday, 12 December 2020

Finding all occurrences of a pattern in a given string in linear time using Z Algorithm

Given a string S and a pattern P, find all occurences of P in S. Supposing the length of S is m and that of P is n, we can find the answer using Z Algorithm in linear time. First, we need to construct a Z array where Z[i] is the length of longest common prefix between S and the suffix starting from S[i]. If Z[i] = 0, it means that S[0] != S[i] and the first element of Z is generally not defined. ``` vector<int> z_function(string s) { int n = (int) s.length(); vector<int> z(n); for (int i = 1, l = 0, r = 0; i < n; ++i) { if (i <= r) z[i] = min (r - i + 1, z[i - l]); while (i + z[i] < n && s[z[i]] == s[i + z[i]]) ++z[i]; if (i + z[i] - 1 > r) l = i, r = i + z[i] - 1; } return z; } ``` For this standard string match problem, we can use Z Algorithm to solve it in O(m + n) on the string P + (something won't be matched in both string) + S. If there is an indice i with Z[i] = n, then we find one occurence. Example: S = zabcabdabc P = abc m = 10 n = 3 Let K = P + (something won't be matched in both string) + S K = P + $ + S = abc$zabcabdabc Z would be {0, 0, 0, 0, 3, 0, 0, 2, 0, 0, 3, 0, 0} There are 2 indices ``i`` with Z[i] = n. Hence, there are 2 occurences of a pattern P in S. Here's the code implemented in C++. ``` void solve() { string p, s; cin >> p >> s; string k = p + "$" + s; vector z = z_function(k); int n = p.size(), m = k.size(), cnt = 0; REP(i, m) if(z[i] == n) cnt++; OUT(cnt); } ```

No comments:

Post a Comment

A Fun Problem - Math

# Problem Statement JATC's math teacher always gives the class some interesting math problems so that they don't get bored. Today t...