# The Blind 75 Leetcode Series: Longest Substring Without Repeating Characters

Today, we are working on 3. Longest Substring Without Repeating Characters.

Given a string `s`, find the length of the longest substring without repeating characters.

Very short description, which opens up to a lot of questions that can be asked. Some of clarifications I can think of and why:

1. do we need to worry about empty string? This dictates if we want to write code to take care of string length=0.
2. It does mention characters, but are these simply a to z or are do we want to count numbers and special characters as well? We probably will treat all different characters the same, but good to know what we are dealing with.
3. Do we count capital case letters the same as lower case? Are “A” and “a” the same? We care because we want to make sure we return the correct count.

Let’s say the answers above are “Yes”, “We care about all characters”, and “Treat A and a differently”, then the question now becomes:

Given a string `s`, which can be empty, find the length of the longest substring without repeating characters. Upper case and lower case of the same letters are treated as different characters, and we will expect special characters such as `!@#` as well.

All of a sudden the question becomes clearer and we understand what we are dealing with more explicitly. Let’s move on.

Let’s start with the brute force approach. The simplest way to do it is to iterate through each character once, then iterate through the rest of the characters to see if we see any repeating character. If yes, track the number of characters we between starting point and point where we hit repeating character, then record the number. If the number is greater than the previous max, we replace it as new max. Otherwise, start the next loop.

Here we need a simple tracker to check if a character has appeared before, so an easy and fast look-up table would be using a `set`. A `set` stores unique values once only, and the look-up cost is O(1). This should be efficient enough in terms of tracking.

The code will look like……

`if len(s) == 0:  # take care of empty string    return 0max_sub_len = 1  # we always have 1 character in trackerfor i in range(len(s)):    tracker = set()    tracker.add(s[i])  # store the first value    for j in range(i+1, len(s)):        if s[j] not in tracker:            tracker.add(s[j])        else:            max_sub_len = max(max_sub_len, len(tracker))            break  # end the process for this substring check    max_sub_len = max(max_sub_len, len(tracker))  # remember to run this one last time if the case of no-repeating character foundreturn max_sub_len`

What’s the time complexity of this appraoch? We are 2 for-loops with linear complexity, so the time complexity is simply O(n²).

`if len(s) == 0:  # O(1) here    return 0max_sub_len = 1  for i in range(len(s)):  # first loop, O(n)    tracker = set()    tracker.add(s[i])  # O(1)    for j in range(i+1, len(s)):  # second loop, O(n)        if s[j] not in tracker:  # O(1)            tracker.add(s[j])  # O(1)        else:            max_sub_len = max(max_sub_len, len(tracker))  # O(1)            break    max_sub_len = max(max_sub_len, len(tracker))  # O(1)return max_sub_len`

Alright, can we do better? Can we track the number without going through the string twice?

Usually when we want to eliminate a loop, we need to track more information. What if we not only track the characters seen but also the location? We then have to change the `set` to a `dict`, which is a key-value tracker with key being the character itself and value being the location.

But which location? Location of first appearance? Location of last appearance? All locations?

I think it makes sense to track the first appearance of a character in a substring, but we want to update it every time we see a repeating character. If position 0 is an “a” and position 4 is also an “a”, when we see the “a” in position 4, we want to update the tracker since that’s the new position where “a” should be.

On top of that, we also want to track the beginning of a substring and the current position. Since we are looping through the string, the current position will be where the for-loop’s pointer is. No need to worry about it. We just need another pointer for the starting position of a substring. So when we hit a repeating character:

• take current and starting pointers’ positions, find length
• compare with previous max length, find new max if there’s one
• update the starting position to the repeating chracter + 1
• update the repeating character’s location value to the current index
• repeat the process

Now translate this into code

`if len(s) == 0:  # the same here    return 0tracker = {}start = 0max_sub_length = 1for i in range(len(s)):    if s[i] in tracker:        max_sub_length = max(max_sub_length, i - start)        start = tracker[s[i]] + 1        tracker[s[i]] = i    else:        tracker[s[i]] = imax_sub_length = max(max_sub_length, i - start + 1)`

Annnnnd it fails for case `abba`, hmm……. seems like when it hits the second `a`, the starting position will be pushed back from `2` to `1`, and then my final max check fails. How do I improve this?

What if we just add one more condition to the if statement and make sure the `start` has to be less than the tracked position for a given letter for us to change the `start`?

Namely, `if s[i] in tracker:` becomes `if s[i] in tracker and start <= tracker[s[i]]:`

Now we eliminates this problem with the `abba` case and the code becomes

`if len(s) == 0:  # the same here    return 0tracker = {}start = 0max_sub_length = 1for i in range(len(s)):    if s[i] in tracker and start <= tracker[s[i]]:        max_sub_length = max(max_sub_length, i - start)        start = tracker[s[i]] + 1        tracker[s[i]] = i    else:        tracker[s[i]] = imax_sub_length = max(max_sub_length, i - start + 1)`

Boom! This works! But we have some cleaning up to do. We can see that `tracker[s[i]]` is being called regardless of the if statement, so let’s move it out. On the other hand, since we are coding in python, let’s make the code more pythonic (cleaner) by using `enumerate` to get both index and the value.

`if len(s) == 0:  # the same here    return 0tracker = {}start = 0max_sub_length = 1for i, c in enumerate(s):    if c in tracker and start <= tracker[c]:        max_sub_length = max(max_sub_length, i - start)        start = tracker[c] + 1    tracker[c] = imax_sub_length = max(max_sub_length, i - start + 1)`

Now, the eyesore, `max_sub_length = max(max_sub_length, i — start + 1)`, makes the code broken. Do we need to do it? Or can we make sure to write this only once? Do we really need to do it again at the very end?

If the `start` moves, we are not going to have a longer substring than before, so taking `max` inside that if statement seems off with this approach. What if we just run it in the `else` so that every time we DON’T see `start` moving, we check the length. Seems viable, right?

`if len(s) == 0:  # the same here    return 0tracker = {}start = 0max_sub_length = 1for i, c in enumerate(s):    if c in tracker and start <= tracker[c]:        start = tracker[c] + 1    else:        max_sub_length = max(max_sub_length, i - start + 1)  # make sure to include i-th index since it's zero-indexed    tracker[c] = i`

Now we can remove the last line as well.

And here it is! A concise and efficient solution. Let’s look at the time complexity again.

`if len(s) == 0:  # O(1)    return 0tracker = {}start = 0max_sub_length = 1for i, c in enumerate(s):  # O(n)    if c in tracker and start <= tracker[c]:        start = tracker[c] + 1    else:        max_sub_length = max(max_sub_length, i - start + 1)    tracker[c] = i`

We have successfully implemented a solution with O(n) time.