187. Repeated DNA Sequences
Description
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Constraints
Approach
Links
- GeeksforGeeks 
- YouTube 
Examples
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
Solutions
/**
 * Time complexity : O((N−L)L), that is O(N) for the constant L=10. In the 
 *    loop executed N−L+1 one builds a substring of length L. Overall that 
 *    results in O((N−L)L) time complexity.
 * Space complexity : O((N−L)L) to keep the hashset, that results in O(N) 
 *		for the constant L=10.
 */
class Solution {
	public List<String> findRepeatedDnaSequences(String s) {
		int L = 10, n = s.length();
		HashSet<String> seen = new HashSet(), output = new HashSet();
		// iterate over all sequences of length L
		for (int start = 0; start < n - L + 1; ++start) {
			String tmp = s.substring(start, start + L);
			if(seen.contains(tmp)) output.add(tmp);
			seen.add(tmp);
		}
		return new ArrayList<String>(output);
	}
}Follow up
Last updated
Was this helpful?