# The Blind 75 Leetcode Series: Group Anagrams

Today, we’ll be working on 49. Group Anagrams

Given an array of strings `strs`, group the anagrams together. You can return the answer in any order.

An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Constraints:

`1 <= strs.length <= 104`

`0 <= strs[i].length <= 100`

`strs[i]` consists of lowercase English letters.

This one is quite simple. We have a list of strings, each string consists of lowercase English letters, and we want to group them by the same anagrams.

Without the constraints, some questions we can ask would be

1. Are the strings always gonna be alphanumerical?
2. Are there uppercase and lowercase of the same letter? Do we treat them as the same letter?

If there are both uppercase and lowercase of the same letter and we should treat them the same, then our approach will need some tweaking, but for now, let’s just work with lowercase.

The intuition is that, for the anagrams, they consist of the same letters, so if we have a way to turn them into the same sequence, then they will be the exactly same strings. The easiest way to do it? Use `sort`.

`sorted_string = sorted(s)`

with Python, `sorted` returns a new list while `sort` sorts in place. We are using `sorted` here as we want to modify the string even more.

We now need a tracker to store these sorted lists. As usual, a dictionary will do the job just fine. However, we cannot use a list as key since it’s mutable, so we want to transform it back to a string.

`from collections import defaultdict # python3 import. py2, which you shouldn't be using anymore, has a different routetracker = defaultdict(list)sorted_s = ''.join(sorted(s))  # s is a string in the listtracker[sorted_s].append(s)`

A bit on explanation for `defaultdict` for those unfamiliar. `defaultdict` is a simple way to deal with newly added keys. Without it, we will need to do something like

`if sorted_s in tracker:    tracker[sorted_s].append(s)else:    tracker[sorted_s] = [s]`

`defaultdict` gets rid of the if-else and you can assume the key exists. This dictionary defaults each key to an empty list, and you can directly append values to it. If a non-existing key is called, it’ll return the default value.

NOTE: `defaultdict(int)` would default each key to `0` and `defaultdict(str)` defaults each key to `''` and so on.

We now just have to sort every string and check against the tracker. All strings with the same sorted string will be grouped together under the same key, and we just need to return all the values of the tracker.

We can add some more logic to do early return on the one-element of no-element list.

It will look like

`def groupAnagrams(strs):    if len(strs) <= 1:        return [strs]    tracker = defaultdict(list)    for s in strs:        sorted_s = ''.join(sorted(s))        tracker[sorted_s].append(s)    return tracker.values()`

What’s the time complexity for this?

Say we have `n` strings in the input, and each string needs a sort, which is O(m logm), where `m` is the length of the word, as we’ve found in previous problems, so the total time complexity would be O(n * m logm)

`def groupAnagrams(strs):    if len(strs) <= 1:        return [strs]    tracker = defaultdict(list)    for s in strs:  # O(n)        sorted_s = ''.join(sorted(s))  # O(m logm)        tracker[sorted_s].append(s)    return tracker.values()`

Is there a better way to do it?

We spend a great deal on the sorting. What if there’s a way to avoid sorting? Since we only have to deal with lowercase alphabets, we have only 26 letters. We can record the position of the letter for each letter in a string, then form a key with it.

Let’s say we have a list of 26 integers, first one being the position of “a”, second being the position of “b”, and so on. A string “cat” will give us

`[1, 0, 1, 0..............0, 1, 0, 0...........0] 0th  2nd                 19th`

With only 1s on position 0, 2, and 19. An anagram of “cat”, say “act”, will also have the same positions. We then turn this list into a tuple or a string, making it immutable. Now we effectively made keys for our tracker again.

`def groupAnagrams(strs):    tracker = defaultdict(list)    position = [0] * 26  # for 26 lowercase alphabets    for s in strs:        for letter in s:            # ord returns the unicode position of the input            # a - z are continuous in unicode, so we can find            # the diff within 26            position[ord(letter)-ord('a')] += 1        tracker[tuple(position)].append(s)        position = [0] * 26    return tracker.values()`

What’s the time complexity now? We got rid of the sort, but we still need to go through each letter in a string, so we have O(m) for each string and O(n) for n string, effectively making it O(nm).

`def groupAnagrams(strs):    tracker = defaultdict(list)    position = [0] * 26  # for 26 lowercase alphabets    # O(n)    for s in strs:        # O(m)        for letter in s:            position[ord(letter)-ord('a')] += 1        tracker[tuple(position)].append(s)        position = [0] * 26    return tracker.values()`

In an interview, if you give me the first answer, that’s a pass for me. If you give me the second answer, I’ll give you bonus point. First approach is intuitive but it already shows if you have knowledge for list, dict, and grouping. The second approach shows you can find a way to optimize the time complexity.

That’s it! Another problem down!

--

--

--

## More from Jonathan Chao

I am a software developer who has been in this industry for close to a decade. I share my experience to people who are or want to get into the industry

Love podcasts or audiobooks? Learn on the go with our new app.

## Jonathan Chao

I am a software developer who has been in this industry for close to a decade. I share my experience to people who are or want to get into the industry