The Blind 75 Leetcode Series: Longest Substring Without Repeating Characters

Today, we are working on 3. Longest Substring Without Repeating Characters.

Given a string s, find the length of the longest substring without repeating characters.

Very short description, which opens up to a lot of questions that can be asked. Some of clarifications I can think of and why:

  1. do we need to worry about empty string? This dictates if we want to write code to take care of string length=0.
  2. It does mention characters, but are these simply a to z or are do we want to count numbers and special characters as well? We probably will treat all different characters the same, but good to know what we are dealing with.
  3. Do we count capital case letters the same as lower case? Are “A” and “a” the same? We care because we want to make sure we return the correct count.

Let’s say the answers above are “Yes”, “We care about all characters”, and “Treat A and a differently”, then the question now becomes:

Given a string s, which can be empty, find the length of the longest substring without repeating characters. Upper case and lower case of the same letters are treated as different characters, and we will expect special characters such as !@# as well.

All of a sudden the question becomes clearer and we understand what we are dealing with more explicitly. Let’s move on.

Let’s start with the brute force approach. The simplest way to do it is to iterate through each character once, then iterate through the rest of the characters to see if we see any repeating character. If yes, track the number of characters we between starting point and point where we hit repeating character, then record the number. If the number is greater than the previous max, we replace it as new max. Otherwise, start the next loop.

Here we need a simple tracker to check if a character has appeared before, so an easy and fast look-up table would be using a set. A set stores unique values once only, and the look-up cost is O(1). This should be efficient enough in terms of tracking.

The code will look like……

if len(s) == 0:  # take care of empty string
return 0
max_sub_len = 1 # we always have 1 character in tracker
for i in range(len(s)):
tracker = set()
tracker.add(s[i]) # store the first value
for j in range(i+1, len(s)):
if s[j] not in tracker:
tracker.add(s[j])
else:
max_sub_len = max(max_sub_len, len(tracker))
break # end the process for this substring check
max_sub_len = max(max_sub_len, len(tracker)) # remember to run this one last time if the case of no-repeating character found
return max_sub_len

What’s the time complexity of this appraoch? We are 2 for-loops with linear complexity, so the time complexity is simply O(n²).

if len(s) == 0:  # O(1) here
return 0
max_sub_len = 1
for i in range(len(s)): # first loop, O(n)
tracker = set()
tracker.add(s[i]) # O(1)
for j in range(i+1, len(s)): # second loop, O(n)
if s[j] not in tracker: # O(1)
tracker.add(s[j]) # O(1)
else:
max_sub_len = max(max_sub_len, len(tracker)) # O(1)
break
max_sub_len = max(max_sub_len, len(tracker)) # O(1)
return max_sub_len

Alright, can we do better? Can we track the number without going through the string twice?

Usually when we want to eliminate a loop, we need to track more information. What if we not only track the characters seen but also the location? We then have to change the set to a dict, which is a key-value tracker with key being the character itself and value being the location.

But which location? Location of first appearance? Location of last appearance? All locations?

I think it makes sense to track the first appearance of a character in a substring, but we want to update it every time we see a repeating character. If position 0 is an “a” and position 4 is also an “a”, when we see the “a” in position 4, we want to update the tracker since that’s the new position where “a” should be.

On top of that, we also want to track the beginning of a substring and the current position. Since we are looping through the string, the current position will be where the for-loop’s pointer is. No need to worry about it. We just need another pointer for the starting position of a substring. So when we hit a repeating character:

  • take current and starting pointers’ positions, find length
  • compare with previous max length, find new max if there’s one
  • update the starting position to the repeating chracter + 1
  • update the repeating character’s location value to the current index
  • repeat the process

Now translate this into code

if len(s) == 0:  # the same here
return 0
tracker = {}
start = 0
max_sub_length = 1
for i in range(len(s)):
if s[i] in tracker:
max_sub_length = max(max_sub_length, i - start)
start = tracker[s[i]] + 1
tracker[s[i]] = i
else:
tracker[s[i]] = i
max_sub_length = max(max_sub_length, i - start + 1)

Annnnnd it fails for case abba, hmm……. seems like when it hits the second a, the starting position will be pushed back from 2 to 1, and then my final max check fails. How do I improve this?

What if we just add one more condition to the if statement and make sure the start has to be less than the tracked position for a given letter for us to change the start?

Namely, if s[i] in tracker: becomes if s[i] in tracker and start <= tracker[s[i]]:

Now we eliminates this problem with the abba case and the code becomes

if len(s) == 0:  # the same here
return 0
tracker = {}
start = 0
max_sub_length = 1
for i in range(len(s)):
if s[i] in tracker and start <= tracker[s[i]]:
max_sub_length = max(max_sub_length, i - start)
start = tracker[s[i]] + 1
tracker[s[i]] = i
else:
tracker[s[i]] = i
max_sub_length = max(max_sub_length, i - start + 1)

Boom! This works! But we have some cleaning up to do. We can see that tracker[s[i]] is being called regardless of the if statement, so let’s move it out. On the other hand, since we are coding in python, let’s make the code more pythonic (cleaner) by using enumerate to get both index and the value.

if len(s) == 0:  # the same here
return 0
tracker = {}
start = 0
max_sub_length = 1
for i, c in enumerate(s):
if c in tracker and start <= tracker[c]:
max_sub_length = max(max_sub_length, i - start)
start = tracker[c] + 1
tracker[c] = i
max_sub_length = max(max_sub_length, i - start + 1)

Now, the eyesore, max_sub_length = max(max_sub_length, i — start + 1), makes the code broken. Do we need to do it? Or can we make sure to write this only once? Do we really need to do it again at the very end?

If the start moves, we are not going to have a longer substring than before, so taking max inside that if statement seems off with this approach. What if we just run it in the else so that every time we DON’T see start moving, we check the length. Seems viable, right?

if len(s) == 0:  # the same here
return 0
tracker = {}
start = 0
max_sub_length = 1
for i, c in enumerate(s):
if c in tracker and start <= tracker[c]:
start = tracker[c] + 1
else:
max_sub_length = max(max_sub_length, i - start + 1) # make sure to include i-th index since it's zero-indexed
tracker[c] = i

Now we can remove the last line as well.

And here it is! A concise and efficient solution. Let’s look at the time complexity again.

if len(s) == 0:  # O(1)
return 0
tracker = {}
start = 0
max_sub_length = 1
for i, c in enumerate(s): # O(n)
if c in tracker and start <= tracker[c]:
start = tracker[c] + 1
else:
max_sub_length = max(max_sub_length, i - start + 1)
tracker[c] = i

We have successfully implemented a solution with O(n) time.

Buy my a coffee: https://www.buymeacoffee.com/jonathanckz

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store