Techie Delight
Coding made easy
Given an array and a positive number k, check whether the array contains any duplicate elements within range k. If k is more than size of the array, the solution should check for duplicates in the complete array.
Input:
A[] = { 5, 6, 8, 2, 4, 6, 9 }
k = 4
Output: Duplicates found
(element 6 is repeated at distance 4 which is <= k)
Input:
A[] = { 5, 6, 8, 2, 4, 6, 9 }
k = 2
Output: No Duplicates found
(element 6 is repeated at distance 4 which is > k)
Input:
A[] = { 1, 2, 3, 2, 1 }
k = 7
Output: Duplicates found
(element 1 and 2 is repeated at distance 4 and 2 respectively which are both <= k)
Naive solution is to consider every sub-array of size k and check for duplicates in it. The time complexity of this solution is O(n.k2) since there can be n subarrays of size k and each subarray might take O(k2) time for checking duplicates.
The problem can be efficiently solved using hashing in O(n) time and O(n) extra space. The idea is to traverse the array and store each element along with its index in a map. i.e. (element, index) pairs as (key, value) pairs in a map. If any element is already found present in the map, we simply check if that element is repeated within range of k using its previous occurrence information from the map.
The algorithm can be implemented as follows in C++, Java and Python:
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
#include <iostream>
#include <vector>
#include <unordered_map>
using namespace std;
bool hasDuplicates(vector<int> &vec, int k)
{
// stores (element, index) pairs as (key, value) pairs
unordered_map<int, int> map;
// traverse the vector
for (int i = 0; i < vec.size(); i++)
{
// if the current element already exists in the map
if (map.find(vec[i]) != map.end()) {
// return true if current element is repeated within range of k
if (i – map[vec[i]] <= k) {
return true;
}
}
// store elements along with their indices
map[vec[i]] = i;
}
// we reach here when no element is repeated within range k
return false;
}
int main()
{
vector<int> vec = { 5, 6, 8, 2, 4, 6, 9 };
int k = 4;
if (hasDuplicates(vec, k)) {
cout << “Duplicates found”;
}
else {
cout << “No Duplicates found”;
}
return 0;
}
|
Output:
Duplicates found
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
import java.util.HashMap;
import java.util.Map;
class Main
{
public static boolean hasDuplicates(int[] arr, int k)
{
// stores (element, index) pairs as (key, value) pairs
Map<Integer, Integer> map = new HashMap<>();
// traverse the array
for (int i = 0; i < arr.length; i++)
{
// if the current element already exists in the map
if (map.containsKey(arr[i])) {
// return true if current element is repeated within range of k
if (i – map.get(arr[i]) <= k) {
return true;
}
}
// store elements along with their indices
map.put(arr[i], i);
}
// we reach here when no element is repeated within range k
return false;
}
public static void main(String[] args) {
int[] arr = { 5, 6, 8, 2, 4, 6, 9 };
int k = 4;
if (hasDuplicates(arr, k)) {
System.out.println(“Duplicates found”);
}
else {
System.out.println(“No Duplicates found”);
}
}
}
|
Output:
Duplicates found
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
def hasDuplicates(A, k):
# stores (element, index) pairs as (key, value) pairs
dict = {}
# traverse the list
for i, e in enumerate(A):
# if the current element already exists in the dict
if e in dict:
# return true if current element is repeated within range of k
if i – dict.get(e) <= k:
return True
# store elements along with their indices
dict[e] = i
# we reach here when no element is repeated within range k
return False
if __name__ == ‘__main__’:
A = [5, 6, 8, 2, 4, 6, 9]
k = 4
if hasDuplicates(A, k):
print(“Duplicates found”)
else:
print(“No Duplicates found”)
|
Output:
Duplicates found
Download Run Code
Output:
Duplicates found
Download Run Code
Output:
Duplicates found
Download Run Code
Output:
Duplicates found
We can also use Sliding Window approach to solve above problem. The idea is process every window of size k one at a time and store its elements in a set. Now if any element repeats in the window, then we can say that it is repeated within range of k.
Initially our window will contain the first k elements of the input. Then for each element of the remaining input, we add it to the current window. While adding the i’th element of the input to the current window, we have to remove the (i-k)’th element from the window at the same time. This is done to maintain efficiency of the solution and to keep window balance at any point.
Below is C++, Java and Python implementation based on above idea –
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
#include <iostream>
#include <vector>
#include <unordered_set>
using namespace std;
// Returns true if element x is present in the given set
bool contains(unordered_set<int> const &set, int x) {
return set.find(x) != set.end();
}
bool hasDuplicates(vector<int> &vec, int k)
{
// create an empty set to store elements within range k
unordered_set<int> window;
// traverse the vector
for (int i = 0; i < vec.size(); i++)
{
// if the current element already exists in the window,
// then it is repeated within range of k
if (contains(window, vec[i])) {
return true;
}
// add current element to the window
window.insert(vec[i]);
// remove the element at k’th range from the current element
if (i >= k) {
window.erase(vec[i – k]);
}
}
// we reach here when no element is repeated within range k
return false;
}
int main()
{
vector<int> vec = { 5, 6, 8, 2, 4, 6, 9 };
int k = 4;
if (hasDuplicates(vec, k)) {
cout << “Duplicates found”;
}
else {
cout << “No Duplicates found”;
}
return 0;
}
|
Output:
Duplicates found
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
import java.util.HashSet;
import java.util.Set;
class Main
{
public static boolean hasDuplicates(int[] arr, int k)
{
// create an empty set to store elements within range k
Set<Integer> window = new HashSet<>();
// traverse the array
for (int i = 0; i < arr.length; i++)
{
// if the current element already exists in the window,
// then it is repeated within range of k
if (window.contains(arr[i])) {
return true;
}
// add current element to the window
window.add(arr[i]);
// remove the element at k’th range from the current element
if (i >= k) {
window.remove(arr[i – k]);
}
}
// we reach here when no element is repeated within range k
return false;
}
public static void main(String[] args) {
int[] arr = { 5, 6, 8, 2, 4, 6, 9 };
int k = 4;
if (hasDuplicates(arr, k)) {
System.out.println(“Duplicates found”);
}
else {
System.out.println(“No Duplicates found”);
}
}
}
|
Output:
Duplicates found
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
def hasDuplicates(A, k):
# create an empty set to store elements within range k
window = set()
# traverse the list
for i in range(len(A)):
# if the current element already exists in the window,
# then it is repeated within range of k
if A[i] in window:
return True
# add current element to the window
window.add(A[i])
# remove the element at k’th range from the current element
if i >= k:
window.remove(A[i – k])
# we reach here when no element is repeated within range k
return False
if __name__ == ‘__main__’:
A = [5, 6, 8, 2, 4, 6, 9]
k = 4
if hasDuplicates(A, k):
print(“Duplicates found”)
else:
print(“No Duplicates found”)
|
Output:
Duplicates found
Download Run Code
Output:
Duplicates found
Download Run Code
Output:
Duplicates found
Download Run Code
Output:
Duplicates found
The time complexity of above solution is O(n) and uses O(n) extra space.
The complexity mentioned is average case. The worst case complexity is n.k using un-ordered map and set.
I think using map and set the work case complexity will be n.log(k)
So if average case complexity is important than un-ordered map and un-ordered set is the right data struct to use. But if worst case complexity is important than ordered set and map should be used.
Can we use min-heap? O(k +nlogk) and O(1) space?
How would a heap occupy O(1) space?