Tutorials on Advanced Math and Computer Science Concepts

Problem 3 - Longest Substring Without Repeated Characters (Medium)

The code for this problem can be found at https://github.com/scprogramming/LeetcodeExplained/tree/master/Problem%20Three%20-%20Longest%20Substring%20without%20Repeating%20Characters%20(Medium)

Problem: Given a string of characters, find the longest substring that has no repeating characters.

This problem on the surface sounds easy to solve, and there are several methods that can be used to solve it. The hard part about this problem is being able to solve it efficiently. A first instinct approach might be to generate all the substrings of the given string, and then remove those with repeating characters. Of those results, compare the lengths, and return the longest one.

This idea does work, however generating every substring of a given string is a complex operation. It takes up a lot of time complexity, and for this problem, we don't need to do that. The solution we will use will take advantage of a few facts that we can find through looking at some example problems. That being said, let's look at a simple example.

Input: abcabcabc

It's clear that this string has a longest substring of "abc", since after this it repeats characters. For that matter, there are actually a few substrings of the same length, such as "bca" or "cab". For this problem, we are assuming that just a single substring of the longest length is enough for us.

Now, think about how you could the longest substring of the input. Likely, your answer wasn't to write down every single substring you could see, then find the longest out of them. Most of the time, what people will do is apply a "sliding window" approach. You start at the first letter and go until you find a repeat. Once this is done, you move up one letter, then do the same. This gives you a result like this:






You can see that this approach is a nicer way of determining the longest substring. The reason it works so well is because you can be certain that if the longest substring starting at a has a repeat, then any other substring of a must be shorter, therefore not the longest substring without a repeat. Therefore, we can eliminate any substring starting at the first letter and start checking from the next letter in the sequence. Let's see how this approach can be coded.

The idea is that we track the characters we have seen in a dictionary or hash table or some similar structure. We get the character that is at our current target index, and we check to see if this is in the set of characters that we have seen. If it is already seen, we check the length of the substring and if it is bigger than the previous, we track it as the new largest substring. Then, we clear the seen characters and current substring, increment our starting point, and start checking again. If the character hasn't been seen yet, we just add the character to the substring and continue checking. At the end, we will have the longest substring possible, and the length of it.

Let's take a look at the time complexity of this algorithm. We can see that we only iterate the string one time, and since the hash table lookup is constant, the number of operations we do on each iteration is minimal. This being the case, we do one operation on each iteration, and iterate at most n times, where n is the number of characters in the string. Therefore, we get an efficiency of O(n).