187. Repeated DNA Sequences

Description

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Constraints

Approach

Examples

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Solutions

/**
 * Time complexity : O((N−L)L), that is O(N) for the constant L=10. In the 
 *    loop executed N−L+1 one builds a substring of length L. Overall that 
 *    results in O((N−L)L) time complexity.
 * Space complexity : O((N−L)L) to keep the hashset, that results in O(N) 
 *		for the constant L=10.
 */

class Solution {
	public List<String> findRepeatedDnaSequences(String s) {
		int L = 10, n = s.length();
		HashSet<String> seen = new HashSet(), output = new HashSet();

		// iterate over all sequences of length L
		for (int start = 0; start < n - L + 1; ++start) {
			String tmp = s.substring(start, start + L);
			if(seen.contains(tmp)) output.add(tmp);
			seen.add(tmp);
		}

		return new ArrayList<String>(output);
	}
}

Follow up

Last updated

Was this helpful?