> ## Documentation Index > Fetch the complete documentation index at: https://hanzla.dev/llms.txt > Use this file to discover all available pages before exploring further. # 04 group anagrams # Group Anagrams 🔗 **NeetCode Problem**:\ [https://neetcode.io/problems/anagram-groups](https://neetcode.io/problems/anagram-groups) 🔗 **LeetCode Problem**:\ [https://leetcode.com/problems/group-anagrams/](https://leetcode.com/problems/group-anagrams/) *** ## Problem Statement Given an array of strings `strs`, group the anagrams together. You can return the answer in any order. An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. **Example:** * Input: `["eat","tea","tan","ate","nat","bat"]` * Output: `[["bat"],["nat","tan"],["ate","eat","tea"]]` *** ## Approach 1: Brute Force (Sorting) ### Idea Sort each string and use the sorted version as a key. All anagrams will have the same sorted representation, so we can group them together in a hash map. ### Pseudocode ``` 1. Create an unordered_map where key is sorted string 2. For each string in input: a. Sort the string b. Use sorted string as key c. Add original string to the vector mapped by this key 3. Extract all vectors from the map and return ``` ### C++ Implementation ```cpp theme={null} class Solution { public: vector> groupAnagrams(vector& strs) { vector> ans; unordered_map> umap; for(string &str : strs) { string original = str; sort(str.begin(), str.end()); umap[str].push_back(original); } for(auto x : umap) { vector temp = x.second; ans.push_back(temp); } return ans; } }; ``` ### Complexity Analysis * **Time Complexity**: **O(n \* k log k)** → Where `n` is the number of strings and `k` is the maximum length of a string. Each string is sorted which takes `O(k log k)`, and we do this for all `n` strings. * **Space Complexity**: **O(n \* k)** → We store all strings in the hash map. ### Why it's not optimal? Sorting each string takes `O(k log k)` time. If strings are very long, this becomes the bottleneck. We can do better by counting character frequencies instead of sorting. *** ## Optimized Approach: Character Frequency Counting ### Idea Instead of sorting, count the frequency of each character (a-z) in every string. Use this frequency array as the key. All anagrams will have identical frequency arrays. ### Key Insight Sorting is unnecessary! Two strings are anagrams if and only if they have the same character frequencies. A frequency array is a direct fingerprint of a string's composition and is faster to compute than sorting. ### How it works: 1. Create a frequency array of size 26 (for a-z) 2. For each string, count character frequencies 3. Use the frequency array as the key in a map 4. Group strings by their frequency fingerprint 5. Extract all groups from the map ### C++ Implementation ```cpp theme={null} class Solution { public: vector> groupAnagrams(vector& strs) { map, vector> mp; vector> res; for(string str : strs) { vector freq(26, 0); for(char ch : str) freq[ch - 'a']++; mp[freq].push_back(str); } for(auto m : mp) res.push_back(m.second); return res; } }; ``` ### Complexity Analysis * **Time Complexity**: **O(n \* k)** → Where `n` is the number of strings and `k` is the maximum length of a string. We iterate through each character once for each string. No sorting involved! * **Space Complexity**: **O(n \* k)** → We store all strings in the map. ### Why it's optimal? We eliminate the sorting step, reducing time complexity from `O(n * k log k)` to `O(n * k)`. Character frequency counting is linear and much faster than sorting. *** ## Comparison: Brute Force vs Optimized | Approach | Time Complexity | Space Complexity | Pros | Cons | | ------------------- | --------------- | ---------------- | ------------------------------------------------- | ------------------------------------------- | | **Sorting** | O(n \* k log k) | O(n \* k) | Simple to understand, works with strings directly | Slower due to sorting overhead | | **Frequency Array** | O(n \* k) | O(n \* k) | **Faster! Linear time per string** | Requires understanding of frequency mapping | *** ## Why Frequency Array Approach is Most Optimal 1. **Linear character processing**: We count frequencies in O(k) time instead of O(k log k) for sorting 2. **No sorting overhead**: Avoids expensive comparison operations inherent in sorting 3. **Scalable**: Better performance as string lengths increase 4. **Direct fingerprinting**: Frequency arrays directly represent what makes strings anagrams 5. **Cache-friendly**: Sequential array iteration is cache-efficient 6. **Predictable performance**: No worst-case slowdown (unlike some sorting algorithms) *** ## 🚨 Important: Why Use `map` Instead of `unordered_map`? ### The Problem with `unordered_map, ...>` ```cpp theme={null} // ❌ This WILL NOT COMPILE unordered_map, vector> mp; ``` **Reason**: `unordered_map` requires a hash function for its keys. While `vector` has comparison operators (needed for `map`), it doesn't have a built-in hash function (needed for `unordered_map`). ### Why `map` Works ```cpp theme={null} // ✅ This COMPILES and WORKS map, vector> mp; ``` `map` is a **tree-based map** (implemented as a Red-Black Tree) that only needs comparison operators (`<`), which `vector` has by default. ### Comparison: `map` vs `unordered_map` | Feature | `map` | `unordered_map` | | ------------------------- | ------------------- | ---------------------------- | | **Data Structure** | Red-Black Tree | Hash Table | | **Requires** | `operator<` | Hash function + `operator==` | | **Lookup Time** | O(log n) | O(1) average | | **Key Requirements** | Any comparable type | Must have hash function | | **Insertion Time** | O(log n) | O(1) average | | **Best for this problem** | ✅ Works immediately | ❌ Requires custom hash | ### Can We Use `unordered_map`? Yes, But It's Complex You'd need to define a custom hash function: ```cpp theme={null} struct VectorHash { size_t operator()(const vector& v) const { size_t hash = 0; for(int val : v) { hash ^= std::hash{}(val) + 0x9e3779b9 + (hash << 6) + (hash >> 2); } return hash; } }; unordered_map, vector, VectorHash> mp; ``` **Why we don't do this**: It adds unnecessary complexity. `map` works fine, and the O(log n) lookup vs O(1) lookup difference is negligible for this problem. ### Rule of Thumb for Future Problems 1. **Can you use `map`?** → Check if your key type is comparable (has `<` operator) * If YES: Use `map` (simpler, always works) * If NO: Go to step 2 2. **Does your key type have a built-in hash?** → Basic types like `int`, `string`, `long long` have built-in hashes * If YES: Use `unordered_map` (faster) * If NO: Go to step 3 3. **Can you define a custom hash function?** → For complex types like `vector`, `pair`, `array` * If YES and it's worth it: Use `unordered_map` with custom hash * If NO or too complex: Use `map` *** ## Example Walkthroughs ### Example 1: Basic Example **Input:** `["eat","tea","tan","ate","nat","bat"]` **Using Frequency Array Approach:** ``` Processing "eat": Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0] (a=1 at index 0, e=1 at index 4, t=1 at index 19) Processing "tea": Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0] (same as "eat" - anagram!) Processing "tan": Frequency: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0] (a=1 at index 0, n=1 at index 13, t=1 at index 19) Processing "ate": Frequency: [1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0] (same as "eat" and "tea" - anagram!) Processing "nat": Frequency: [1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0] (same as "tan" - anagram!) Processing "bat": Frequency: [1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0] (a=1 at index 0, b=1 at index 1, t=1 at index 19) (only 'a', 'b', 't' - unique) Map after processing: [a=1,e=1,t=1] → ["eat", "tea", "ate"] [a=1,n=1,t=1] → ["tan", "nat"] [a=1,b=1,t=1] → ["bat"] ``` **Output:** `[["eat","tea","ate"], ["tan","nat"], ["bat"]]` ### Example 2: Single Characters **Input:** `["a","b","ba","bb","baa"]` **Using Frequency Array Approach:** ``` "a": [1,0,0,...] → key represents single 'a' "b": [0,1,0,...] → key represents single 'b' "ba": [1,1,0,...] → key represents one 'a' and one 'b' "bb": [0,2,0,...] → key represents two 'b's "baa":[2,1,0,...] → key represents two 'a's and one 'b' Map groups: [1,0,0,...] → ["a"] [0,1,0,...] → ["b"] [1,1,0,...] → ["ba"] [0,2,0,...] → ["bb"] [2,1,0,...] → ["baa"] ``` **Output:** `[["a"], ["b"], ["ba"], ["bb"], ["baa"]]` ### Example 3: Identical Anagrams **Input:** `["abc","bca","cab"]` **Using Frequency Array Approach:** ``` "abc": [1,1,1,0,0,...] → one 'a', one 'b', one 'c' "bca": [1,1,1,0,0,...] → same frequency (anagram!) "cab": [1,1,1,0,0,...] → same frequency (anagram!) Map groups: [1,1,1,0,0,...] → ["abc", "bca", "cab"] ``` **Output:** `[["abc", "bca", "cab"]]` *** ## Key Takeaways * **Anagrams have identical character frequencies**: This is the core insight that makes the solution work. * **Frequency counting beats sorting**: O(n*k) is faster than O(n*k log k) for this problem. * **Use `map` when key type is comparable**: It's simpler than implementing custom hash functions. * **`vector` works as a key in `map`**: Because vectors are comparable by default. * **`vector` doesn't work in `unordered_map`**: Unless you define a custom hash function. * **Linear traversal is often better than sorting**: When you just need to count/group, avoid expensive sorting operations. * **Understand data structure requirements**: Different containers have different requirements for their key types.