> ## Documentation Index
> Fetch the complete documentation index at: https://hanzla.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# 04 group anagrams

# Group Anagrams

🔗 **NeetCode Problem**:\
[https://neetcode.io/problems/anagram-groups](https://neetcode.io/problems/anagram-groups)

🔗 **LeetCode Problem**:\
[https://leetcode.com/problems/group-anagrams/](https://leetcode.com/problems/group-anagrams/)

***

## Problem Statement

Given an array of strings `strs`, group the anagrams together. You can return the answer in any order.

An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

**Example:**

* Input: `["eat","tea","tan","ate","nat","bat"]`
* Output: `[["bat"],["nat","tan"],["ate","eat","tea"]]`

***

## Approach 1: Brute Force (Sorting)

### Idea

Sort each string and use the sorted version as a key. All anagrams will have the same sorted representation, so we can group them together in a hash map.

### Pseudocode

```
1. Create an unordered_map where key is sorted string
2. For each string in input:
   a. Sort the string
   b. Use sorted string as key
   c. Add original string to the vector mapped by this key
3. Extract all vectors from the map and return
```

### C++ Implementation

```cpp theme={null}
class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        vector<vector<string>> ans;
        unordered_map<string, vector<string>> umap;
        
        for(string &str : strs) {
            string original = str;
            sort(str.begin(), str.end());
            umap[str].push_back(original);
        }
        
        for(auto x : umap) {
            vector<string> temp = x.second;
            ans.push_back(temp);
        }
        
        return ans;
    }
};
```

### Complexity Analysis

* **Time Complexity**: **O(n \* k log k)** → Where `n` is the number of strings and `k` is the maximum length of a string. Each string is sorted which takes `O(k log k)`, and we do this for all `n` strings.
* **Space Complexity**: **O(n \* k)** → We store all strings in the hash map.

### Why it's not optimal?

Sorting each string takes `O(k log k)` time. If strings are very long, this becomes the bottleneck. We can do better by counting character frequencies instead of sorting.

***

## Optimized Approach: Character Frequency Counting

### Idea

Instead of sorting, count the frequency of each character (a-z) in every string. Use this frequency array as the key. All anagrams will have identical frequency arrays.

### Key Insight

Sorting is unnecessary! Two strings are anagrams if and only if they have the same character frequencies. A frequency array is a direct fingerprint of a string's composition and is faster to compute than sorting.

### How it works:

1. Create a frequency array of size 26 (for a-z)
2. For each string, count character frequencies
3. Use the frequency array as the key in a map
4. Group strings by their frequency fingerprint
5. Extract all groups from the map

### C++ Implementation

```cpp theme={null}
class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        map<vector<int>, vector<string>> mp;
        vector<vector<string>> res;

        for(string str : strs) {
            vector<int> freq(26, 0);
            for(char ch : str)
                freq[ch - 'a']++;
            mp[freq].push_back(str);
        }

        for(auto m : mp)
            res.push_back(m.second);

        return res;
    }
};
```

### Complexity Analysis

* **Time Complexity**: **O(n \* k)** → Where `n` is the number of strings and `k` is the maximum length of a string. We iterate through each character once for each string. No sorting involved!
* **Space Complexity**: **O(n \* k)** → We store all strings in the map.

### Why it's optimal?

We eliminate the sorting step, reducing time complexity from `O(n * k log k)` to `O(n * k)`. Character frequency counting is linear and much faster than sorting.

***

## Comparison: Brute Force vs Optimized

| Approach            | Time Complexity | Space Complexity | Pros                                              | Cons                                        |
| ------------------- | --------------- | ---------------- | ------------------------------------------------- | ------------------------------------------- |
| **Sorting**         | O(n \* k log k) | O(n \* k)        | Simple to understand, works with strings directly | Slower due to sorting overhead              |
| **Frequency Array** | O(n \* k)       | O(n \* k)        | **Faster! Linear time per string**                | Requires understanding of frequency mapping |

***

## Why Frequency Array Approach is Most Optimal

1. **Linear character processing**: We count frequencies in O(k) time instead of O(k log k) for sorting
2. **No sorting overhead**: Avoids expensive comparison operations inherent in sorting
3. **Scalable**: Better performance as string lengths increase
4. **Direct fingerprinting**: Frequency arrays directly represent what makes strings anagrams
5. **Cache-friendly**: Sequential array iteration is cache-efficient
6. **Predictable performance**: No worst-case slowdown (unlike some sorting algorithms)

***

## 🚨 Important: Why Use `map` Instead of `unordered_map`?

### The Problem with `unordered_map<vector<int>, ...>`

```cpp theme={null}
// ❌ This WILL NOT COMPILE
unordered_map<vector<int>, vector<string>> mp;
```

**Reason**: `unordered_map` requires a hash function for its keys. While `vector<int>` has comparison operators (needed for `map`), it doesn't have a built-in hash function (needed for `unordered_map`).

### Why `map` Works

```cpp theme={null}
// ✅ This COMPILES and WORKS
map<vector<int>, vector<string>> mp;
```

`map` is a **tree-based map** (implemented as a Red-Black Tree) that only needs comparison operators (`<`), which `vector<int>` has by default.

### Comparison: `map` vs `unordered_map`

| Feature                   | `map`               | `unordered_map`              |
| ------------------------- | ------------------- | ---------------------------- |
| **Data Structure**        | Red-Black Tree      | Hash Table                   |
| **Requires**              | `operator<`         | Hash function + `operator==` |
| **Lookup Time**           | O(log n)            | O(1) average                 |
| **Key Requirements**      | Any comparable type | Must have hash function      |
| **Insertion Time**        | O(log n)            | O(1) average                 |
| **Best for this problem** | ✅ Works immediately | ❌ Requires custom hash       |

### Can We Use `unordered_map`? Yes, But It's Complex

You'd need to define a custom hash function:

```cpp theme={null}
struct VectorHash {
    size_t operator()(const vector<int>& v) const {
        size_t hash = 0;
        for(int val : v) {
            hash ^= std::hash<int>{}(val) + 0x9e3779b9 + (hash << 6) + (hash >> 2);
        }
        return hash;
    }
};

unordered_map<vector<int>, vector<string>, VectorHash> mp;
```

**Why we don't do this**: It adds unnecessary complexity. `map` works fine, and the O(log n) lookup vs O(1) lookup difference is negligible for this problem.

### Rule of Thumb for Future Problems

1. **Can you use `map`?** → Check if your key type is comparable (has `<` operator)
   * If YES: Use `map` (simpler, always works)
   * If NO: Go to step 2

2. **Does your key type have a built-in hash?** → Basic types like `int`, `string`, `long long` have built-in hashes
   * If YES: Use `unordered_map` (faster)
   * If NO: Go to step 3

3. **Can you define a custom hash function?** → For complex types like `vector`, `pair`, `array`
   * If YES and it's worth it: Use `unordered_map` with custom hash
   * If NO or too complex: Use `map`

***

## Example Walkthroughs

### Example 1: Basic Example

**Input:** `["eat","tea","tan","ate","nat","bat"]`

**Using Frequency Array Approach:**

```
Processing "eat":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, e=1 at index 4, t=1 at index 19)
  
Processing "tea":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "eat" - anagram!)
  
Processing "tan":
  Frequency: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, n=1 at index 13, t=1 at index 19)
  
Processing "ate":
  Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "eat" and "tea" - anagram!)
  
Processing "nat":
  Frequency: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0]
             (same as "tan" - anagram!)
  
Processing "bat":
  Frequency: [1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
             (a=1 at index 0, b=1 at index 1, t=1 at index 19)
             (only 'a', 'b', 't' - unique)

Map after processing:
  [a=1,e=1,t=1] → ["eat", "tea", "ate"]
  [a=1,n=1,t=1] → ["tan", "nat"]
  [a=1,b=1,t=1] → ["bat"]
```

**Output:** `[["eat","tea","ate"], ["tan","nat"], ["bat"]]`

### Example 2: Single Characters

**Input:** `["a","b","ba","bb","baa"]`

**Using Frequency Array Approach:**

```
"a":  [1,0,0,...] → key represents single 'a'
"b":  [0,1,0,...] → key represents single 'b'
"ba": [1,1,0,...] → key represents one 'a' and one 'b'
"bb": [0,2,0,...] → key represents two 'b's
"baa":[2,1,0,...] → key represents two 'a's and one 'b'

Map groups:
  [1,0,0,...] → ["a"]
  [0,1,0,...] → ["b"]
  [1,1,0,...] → ["ba"]
  [0,2,0,...] → ["bb"]
  [2,1,0,...] → ["baa"]
```

**Output:** `[["a"], ["b"], ["ba"], ["bb"], ["baa"]]`

### Example 3: Identical Anagrams

**Input:** `["abc","bca","cab"]`

**Using Frequency Array Approach:**

```
"abc": [1,1,1,0,0,...] → one 'a', one 'b', one 'c'
"bca": [1,1,1,0,0,...] → same frequency (anagram!)
"cab": [1,1,1,0,0,...] → same frequency (anagram!)

Map groups:
  [1,1,1,0,0,...] → ["abc", "bca", "cab"]
```

**Output:** `[["abc", "bca", "cab"]]`

***

## Key Takeaways

* **Anagrams have identical character frequencies**: This is the core insight that makes the solution work.
* **Frequency counting beats sorting**: O(n*k) is faster than O(n*k log k) for this problem.
* **Use `map` when key type is comparable**: It's simpler than implementing custom hash functions.
* **`vector<int>` works as a key in `map`**: Because vectors are comparable by default.
* **`vector<int>` doesn't work in `unordered_map`**: Unless you define a custom hash function.
* **Linear traversal is often better than sorting**: When you just need to count/group, avoid expensive sorting operations.
* **Understand data structure requirements**: Different containers have different requirements for their key types.
