Brownstone: May 2021

# Problem Statement Given a string S, find out the length of the longest repeating substring(s). Return 0 if no repeating substring exists. # Approach 1 : Dynamic Programming The problem can be rephased as finding out the length of the longest common substring and we don't need to care about the occurrence of repeating substrings. We can solve it using dynamic programming. Let's define $ dp[i][j] $ as the maximum length of longest common subarray ending with S[i] and S[j]. The base case is when $ i $ or $ j $ is 0, then dp[i][j] must be 0. Else we can start from $ i = 1 .. n $ and $ j = 1 .. m $. If two characters are equal, we can set the current dp[i][j] to the previous state dp[i - 1][j - 1] plus one. Since we are looking for the maximum value here so for each update we check if the current state is greater than the current answer. The time complexity for this solution is $ O(n ^ 2) $. ```cpp int longestRepeatingSubstring(string S) { int n = S.size(); vector> dp(n + 1, vector(n + 1)); int ans = 0; for(int i = 1; i <= n; i++) { for(int j = i + 1; j <= n; j++) { if(S[i - 1] == S[j - 1]) { dp[i][j] = dp[i - 1][j - 1] + 1; ans = max(ans, dp[i][j]); } } } return ans; } ``` # Approach 2 : Binary Search + Sliding Windows + Set Another approach is to use binary search to find out the window size $ L $ and put each string within this window to a set. If the string is already seen, that means this substring is repeating. The average time complexity would be $ O(n log n)$. ```cpp int ok(string S, int m, int n) { unordered_set s; string tmp; for(int i = 0; i < n - m + 1; i++) { tmp = S.substr(i, m); if(s.count(tmp) > 0) return i; s.insert(tmp); } return -1; } int longestRepeatingSubstring(string S) { int n = S.size(); int l = 1, r = n, m; while(l <= r) { m = l + (r - l) / 2; if(ok(S, m, n) != -1) l = m + 1; else r = m - 1; } return l - 1; } ``` # Approach 3 : Binary Search + Sliding Windows + Set of Hash values The previous approach can be further fine-tuned as the length of string can go up to 1500. We can reduce the memory consumption by storing the hash value of the string instead of the string itself. ```cpp int ok(string S, int m, int n) { unordered_set s; string tmp; for(int i = 0; i < n - m + 1; i++) { tmp = compute_hash(S.substr(i, m)); if(s.count(tmp) > 0) return i; s.insert(tmp); } return -1; } ``` For the function ``compute_hash``, we can use a polynomial rolling hash function. Generally given the string S and the length N, the function can be defined as: $$ H(S) = S[0] + S[1] * P + S[2] * P ^ 2 + ... + S[N - 1] * P ^ {N - 1} \pmod M $$ $$ = \sum\limits_{i = 0}^{N - 1} {S[i] * P^i} \pmod M $$ In this problem, the string S consists of only lowercase English letters from 'a' - 'z'. Hence, we can make the closest prime number which is greater or equal to the input alphabet. In this case, it is 31. If it contains both uppercase and lowercase, we could use $ p = 53 $. For M, we should use a large prime number. $ 10 ^ 9 + 9 $ will be used in this problem. ```cpp string compute_hash(string const& s) { const int p = 31; const int m = 1e9 + 9; long long hash_value = 0; long long p_pow = 1; for (char c : s) { hash_value = (hash_value + (c - 'a' + 1) * p_pow) % m; p_pow = (p_pow * p) % m; } return to_string(hash_value); } ```

Brownstone

Sunday 9 May 2021

LeetCode - 1062. Longest Repeating Substring

A Fun Problem - Math

Report Abuse