> ## Documentation Index > Fetch the complete documentation index at: https://hanzla.dev/llms.txt > Use this file to discover all available pages before exploring further. # 06 encode decode strings # String Encode and Decode 🔗 **NeetCode Problem**: [https://neetcode.io/problems/encode-and-decode-strings](https://neetcode.io/problems/encode-and-decode-strings) 🔗 **LeetCode Problem**: [https://leetcode.com/problems/encode-and-decode-strings](https://leetcode.com/problems/encode-and-decode-strings) *** ## Problem Statement Design an algorithm to encode a list of strings to a single string, and decode the encoded string back to the original list of strings. **Constraints:** * The string may contain any possible characters. * Strings can be empty. * You cannot use external libraries like base64 encoding. **Example:** * Input: `["Hello","World"]` * Encoded: `5#Hello5#World` * Decoded: `["Hello","World"]` *** ## Approach 1: Brute Force ### Idea Use a simple delimiter (like comma) to separate strings. However, this fails if the delimiter appears in the string itself. ### Pseudocode ``` encode(strs): result = "" for each str in strs: result += str + "," return result decode(s): return s.split(",") ``` ### C++ Implementation ```cpp theme={null} class Solution { public: string encode(vector& strs) { string result = ""; for(string str : strs) result += str + ","; return result; } vector decode(string s) { vector result; string current = ""; for(char c : s) { if(c == ',') { result.push_back(current); current = ""; } else { current += c; } } return result; } }; ``` ### Complexity Analysis * **Time Complexity**: **O(n)** → Where n is the total number of characters across all strings * **Space Complexity**: **O(n)** → For storing the encoded result ### Why it's not optimal? If any string contains the delimiter character (comma in this case), the decoding will break. For example: * Input: `["a,b", "c"]` * Encoded: `a,b,c,` * Decoded: `["a", "b", "c"]` ❌ (Wrong!) This approach cannot handle strings with delimiters in them. *** ## Optimized Approach: Length-Prefixed Encoding ### Idea Instead of using a delimiter, encode the length of each string followed by a separator, then the string itself. This way, we know exactly how many characters belong to each string, making it impossible for content to interfere with the format. ### Key Insight By storing the **length of each string** before the string itself, we can unambiguously decode any string, regardless of its content. The format becomes: `length#string` where `#` is a separator that guides our parsing, not a content delimiter. ### How it works: 1. **Encoding**: For each string, append its length, followed by `#`, followed by the string itself 2. **Decoding**: Parse the length until we hit `#`, then read exactly that many characters for the string 3. Repeat until we've decoded all strings ### C++ Implementation ```cpp theme={null} class Solution { public: string encode(vector& strs) { string result = ""; for(string str : strs) result += to_string(str.size()) + "#" + str; return result; } vector decode(string s) { vector result; int i = 0; while(i < s.size()) { int j = i; // Find the '#' that separates length from string while(s[j] != '#') j++; // Extract the length int len = stoi(s.substr(i, j - i)); // Extract the string of that length string word = s.substr(j + 1, len); result.push_back(word); // Move to next string i = j + 1 + len; } return result; } }; ``` ### Complexity Analysis * **Time Complexity**: **O(n)** → Where n is the total number of characters. We process each character exactly once. * **Space Complexity**: **O(n)** → For storing the encoded result (or decoded result in space-efficient implementation) ### Why it's optimal? 1. Works with any character in the string, including `#`, commas, or any special character 2. Linear time complexity - we never backtrack or revisit characters 3. Linear space complexity - only storing the output 4. Simple and elegant implementation 5. No edge cases that break the algorithm *** ## Comparison: Brute Force vs Optimized | Approach | Time Complexity | Space Complexity | Pros | Cons | | ---------------------------- | --------------- | ---------------- | ------------------------------------------------ | ------------------------------ | | **Brute Force (Delimiter)** | O(n) | O(n) | Simple to understand | Fails with delimiter in string | | **Length-Prefixed Encoding** | O(n) | O(n) | **Handles all characters**, Unambiguous decoding | Slightly more complex parsing | *** ## Why Length-Prefixed Encoding is Most Optimal 1. **Universal Delimiter Independence** - No character can break the format since length dictates how many chars to read 2. **Linear Efficiency** - Single pass through the data for both encoding and decoding 3. **Handles All Edge Cases** - Empty strings, special characters, very long strings all work perfectly 4. **Deterministic Parsing** - No ambiguity; each byte of input maps to a unique output 5. **Industry Standard** - Used in protocols like gRPC, Protobuf, and network communication 6. **No External Dependencies** - Pure algorithm, no external libraries needed *** ## Example Walkthroughs ### Example 1: Simple Strings **Input**: `["Hello", "World"]` **Using Length-Prefixed Encoding:** ``` Encoding: - "Hello" has length 5 → "5#Hello" - "World" has length 5 → "5#World" - Result: "5#Hello5#World" Decoding "5#Hello5#World": - i=0: Find '#' at j=1 - Length = 5 - Read 5 chars from j+1: "Hello" - i = 1 + 1 + 5 = 7 - i=7: Find '#' at j=8 - Length = 5 - Read 5 chars from j+1: "World" - i = 8 + 1 + 5 = 14 (end) - Result: ["Hello", "World"] ✓ ``` Strings successfully encoded and decoded! ### Example 2: Strings with Special Characters **Input**: `["a#b", "c,d", "e"]` **Using Length-Prefixed Encoding:** ``` Encoding: - "a#b" has length 3 → "3#a#b" - "c,d" has length 3 → "3#c,d" - "e" has length 1 → "1#e" - Result: "3#a#b3#c,d1#e" Decoding "3#a#b3#c,d1#e": - i=0: Find '#' at j=1 - Length = 3 - Read 3 chars: "a#b" (the # inside doesn't matter!) - i = 1 + 1 + 3 = 5 - i=5: Find '#' at j=6 - Length = 3 - Read 3 chars: "c,d" (the comma is just data) - i = 6 + 1 + 3 = 10 - i=10: Find '#' at j=11 - Length = 1 - Read 1 char: "e" - i = 11 + 1 + 1 = 13 (end) - Result: ["a#b", "c,d", "e"] ✓ ``` Notice how special characters don't break the decoding! ### Example 3: Empty Strings **Input**: `["", "abc", ""]` **Using Length-Prefixed Encoding:** ``` Encoding: - "" has length 0 → "0#" - "abc" has length 3 → "3#abc" - "" has length 0 → "0#" - Result: "0#3#abc0#" Decoding "0#3#abc0#": - i=0: Find '#' at j=1 - Length = 0 - Read 0 chars: "" - i = 1 + 1 + 0 = 2 - i=2: Find '#' at j=3 - Length = 3 - Read 3 chars: "abc" - i = 3 + 1 + 3 = 7 - i=7: Find '#' at j=8 - Length = 0 - Read 0 chars: "" - i = 8 + 1 + 0 = 9 (end) - Result: ["", "abc", ""] ✓ ``` Empty strings are handled gracefully! *** ## Key Takeaways * **Length-Prefixed Encoding** is the standard solution for this problem, used in real-world protocols * **Never rely solely on delimiters** when the delimited content is untrusted or arbitrary * **Separating metadata (length) from content** allows safe serialization of any data * **Linear time and space** is optimal for serialization problems * **Test edge cases**: empty strings, special characters, very long strings * **This pattern appears everywhere**: network protocols, binary serialization, and data encoding