Given a string and a positive integer k, find all distinct substrings of a given string of any length containing exactly k distinct characters.
For example,
A simple solution is to generate all substrings of the given string and return substrings containing exactly k
distinct characters. The time complexity of this solution is O(n3), since it takes O(n2) time to generate all substrings and O(n) time to process each substring.
We can improve the time complexity using extra space. The idea is to process substrings starting with each index i
in the input string. In each iteration of the loop, add each character present in the processed substring to a hash table. If the count of distinct characters becomes k
at index j
at any point of time, include the substring str[i…j]
in the result.
Following is the C++, Java, and Python program that demonstrates it:
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
#include <iostream>
#include <string>
#include <unordered_set>
using namespace std;
// Function to find all distinct substrings containing exactly `k` distinct characters
unordered_set<string> getDistinctSubstrings(string str, int k)
{
// create a hash set to store substrings containing exactly `k` distinct characters
unordered_set<string> result;
// in each iteration of the loop, consider substring starting with `str[i]`
for (int i = 0; i < str.length() – k + 1; i++)
{
// create a hash set to store distinct characters in the current substring
unordered_set<char> chars;
// Process substring starting with `str[i]`
for (int j = i; j < str.length() && chars.size() <= k; j++)
{
// insert the current character `str[j]` to the hash set
chars.insert(str[j]);
// if the current character `str[j]` is seen before in the
// substring `str[i…j-1]`, the count remains the same since
// the hash set only allows unique values
// if the count of distinct characters becomes `k`
if (chars.size() == k)
{
// add the current substring to result
result.insert(str.substr(i, j – i + 1));
}
}
}
return result;
}
int main()
{
string str = “abcadce”;
int k = 4;
unordered_set<string> result = getDistinctSubstrings(str, k);
for (string s: result) {
cout << s << endl;
}
return 0;
}
|
Output:
adce
abcadc
bcadc
abcad
bcad
cadce
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
import java.util.HashSet;
import java.util.Set;
class Main
{
// Function to find all distinct substrings containing exactly `k` distinct chars
public static Set<String> getDistinctSubstrings(String str, int k)
{
// create a hash set to store substrings containing exactly `k` distinct chars
Set<String> result = new HashSet<>();
// in each iteration of the loop, consider substring starting with `str[i]`
for (int i = 0; i < str.length() – k + 1; i++)
{
// create a hash set to store distinct characters in the current substring
Set<Character> chars = new HashSet<>();
// Process substring starting with `str[i]`
for (int j = i; j < str.length() && chars.size() <= k; j++)
{
// insert the current character `str[j]` to the hash set
chars.add(str.charAt(j));
// if the current character `str[j]` is seen before in the
// substring `str[i…j-1]`, the count remains the same since
// the hash set only allows unique values
// if the count of distinct characters becomes `k`
if (chars.size() == k) {
// add the current substring to result
result.add(str.substring(i, j + 1));
}
}
}
return result;
}
public static void main(String[] args)
{
String str = “abcadce”;
int k = 4;
Set<String> result = getDistinctSubstrings(str, k);
System.out.println(result);
}
}
|
Output:
[abcad, abcadc, cadce, adce, bcad, bcadc]
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
# Function to find all distinct substrings containing exactly `k` distinct characters
def getDistinctSubstrings(str, k):
# create a hash set to store substrings containing exactly `k` distinct characters
result = set()
# in each iteration of the loop, consider substring starting with `str[i]`
for i in range(len(str) – k + 1):
# create a hash set to store distinct characters in the current substring
chars = set()
# Process substring starting with `str[i]`
for j in range(i, len(str)):
# insert the current character `str[j]` to the hash set
chars.add(str[j])
# if the current character `str[j]` is seen before in the
# substring `str[i…j-1]`, the count remains the same since
# the hash set only allows unique values
# if the count of distinct characters becomes `k`
if len(chars) == k:
# add the current substring to result
result.add(str[i: j + 1])
return result
if __name__ == ‘__main__’:
str = “abcadce”
k = 4
result = getDistinctSubstrings(str, k)
print(result)
|
Output:
{‘bcad’, ‘adce’, ‘cadce’, ‘bcadc’, ‘abcad’, ‘abcadc’}
Download Run Code
Output:
adce
abcadc
bcadc
abcad
bcad
cadce
Download Run Code
Output:
[abcad, abcadc, cadce, adce, bcad, bcadc]
Download Run Code
Output:
{‘bcad’, ‘adce’, ‘cadce’, ‘bcadc’, ‘abcad’, ‘abcadc’}
The time complexity of the above solution is O(n2), and the auxiliary space used by the program is O(n).
Exercise: Modify the solution to find substrings of length k
containing exactly k
distinct characters.