All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Constraints
Approach
Links
GeeksforGeeks
YouTube
Examples
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
Solutions
/**
* Time complexity : O((N−L)L), that is O(N) for the constant L=10. In the
* loop executed N−L+1 one builds a substring of length L. Overall that
* results in O((N−L)L) time complexity.
* Space complexity : O((N−L)L) to keep the hashset, that results in O(N)
* for the constant L=10.
*/
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
int L = 10, n = s.length();
HashSet<String> seen = new HashSet(), output = new HashSet();
// iterate over all sequences of length L
for (int start = 0; start < n - L + 1; ++start) {
String tmp = s.substring(start, start + L);
if(seen.contains(tmp)) output.add(tmp);
seen.add(tmp);
}
return new ArrayList<String>(output);
}
}