Sunday, 9 May 2021
LeetCode - 1062. Longest Repeating Substring
# Problem Statement
Given a string S, find out the length of the longest repeating substring(s). Return 0 if no repeating substring exists.
# Approach 1 : Dynamic Programming
The problem can be rephased as finding out the length of the longest common substring and we don't need to care about the occurrence of repeating substrings. We can solve it using dynamic programming. Let's define $ dp[i][j] $ as the maximum length of longest common subarray ending with S[i] and S[j].
The base case is when $ i $ or $ j $ is 0, then dp[i][j] must be 0. Else we can start from $ i = 1 .. n $ and $ j = 1 .. m $. If two characters are equal, we can set the current dp[i][j] to the previous state dp[i - 1][j - 1] plus one. Since we are looking for the maximum value here so for each update we check if the current state is greater than the current answer. The time complexity for this solution is $ O(n ^ 2) $.
```cpp
int longestRepeatingSubstring(string S) {
int n = S.size();
vector> dp(n + 1, vector(n + 1));
int ans = 0;
for(int i = 1; i <= n; i++) {
for(int j = i + 1; j <= n; j++) {
if(S[i - 1] == S[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1;
ans = max(ans, dp[i][j]);
}
}
}
return ans;
}
```
# Approach 2 : Binary Search + Sliding Windows + Set
Another approach is to use binary search to find out the window size $ L $ and put each string within this window to a set. If the string is already seen, that means this substring is repeating. The average time complexity would be $ O(n log n)$.
```cpp
int ok(string S, int m, int n) {
unordered_set s;
string tmp;
for(int i = 0; i < n - m + 1; i++) {
tmp = S.substr(i, m);
if(s.count(tmp) > 0) return i;
s.insert(tmp);
}
return -1;
}
int longestRepeatingSubstring(string S) {
int n = S.size();
int l = 1, r = n, m;
while(l <= r) {
m = l + (r - l) / 2;
if(ok(S, m, n) != -1) l = m + 1;
else r = m - 1;
}
return l - 1;
}
```
# Approach 3 : Binary Search + Sliding Windows + Set of Hash values
The previous approach can be further fine-tuned as the length of string can go up to 1500. We can reduce the memory consumption by storing the hash value of the string instead of the string itself.
```cpp
int ok(string S, int m, int n) {
unordered_set s;
string tmp;
for(int i = 0; i < n - m + 1; i++) {
tmp = compute_hash(S.substr(i, m));
if(s.count(tmp) > 0) return i;
s.insert(tmp);
}
return -1;
}
```
For the function ``compute_hash``, we can use a polynomial rolling hash function. Generally given the string S and the length N, the function can be defined as:
$$
H(S) = S[0] + S[1] * P + S[2] * P ^ 2 + ... + S[N - 1] * P ^ {N - 1} \pmod M
$$
$$
= \sum\limits_{i = 0}^{N - 1} {S[i] * P^i} \pmod M
$$
In this problem, the string S consists of only lowercase English letters from 'a' - 'z'. Hence, we can make the closest prime number which is greater or equal to the input alphabet. In this case, it is 31. If it contains both uppercase and lowercase, we could use $ p = 53 $. For M, we should use a large prime number. $ 10 ^ 9 + 9 $ will be used in this problem.
```cpp
string compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
hash_value = (hash_value + (c - 'a' + 1) * p_pow) % m;
p_pow = (p_pow * p) % m;
}
return to_string(hash_value);
}
```
Subscribe to:
Posts (Atom)
A Fun Problem - Math
# Problem Statement JATC's math teacher always gives the class some interesting math problems so that they don't get bored. Today t...
-
SHA stands for Secure Hashing Algorithm and 2 is just a version number. SHA-2 revises the construction and the big-length of the signature f...
-
Contest Link: [https://www.e-olymp.com/en/contests/19775](https://www.e-olymp.com/en/contests/19775) Full Solution: [https://github.com/...