> ## Documentation Index
> Fetch the complete documentation index at: https://hanzla.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# 06 encode decode strings

# String Encode and Decode

🔗 **NeetCode Problem**: [https://neetcode.io/problems/encode-and-decode-strings](https://neetcode.io/problems/encode-and-decode-strings)

🔗 **LeetCode Problem**: [https://leetcode.com/problems/encode-and-decode-strings](https://leetcode.com/problems/encode-and-decode-strings)

***

## Problem Statement

Design an algorithm to encode a list of strings to a single string, and decode the encoded string back to the original list of strings.

**Constraints:**

* The string may contain any possible characters.
* Strings can be empty.
* You cannot use external libraries like base64 encoding.

**Example:**

* Input: `["Hello","World"]`
* Encoded: `5#Hello5#World`
* Decoded: `["Hello","World"]`

***

## Approach 1: Brute Force

### Idea

Use a simple delimiter (like comma) to separate strings. However, this fails if the delimiter appears in the string itself.

### Pseudocode

```
encode(strs):
    result = ""
    for each str in strs:
        result += str + ","
    return result

decode(s):
    return s.split(",")
```

### C++ Implementation

```cpp theme={null}
class Solution {
public:
    string encode(vector<string>& strs) {
        string result = "";
        for(string str : strs)
            result += str + ",";
        return result;
    }

    vector<string> decode(string s) {
        vector<string> result;
        string current = "";
        for(char c : s) {
            if(c == ',') {
                result.push_back(current);
                current = "";
            } else {
                current += c;
            }
        }
        return result;
    }
};
```

### Complexity Analysis

* **Time Complexity**: **O(n)** → Where n is the total number of characters across all strings
* **Space Complexity**: **O(n)** → For storing the encoded result

### Why it's not optimal?

If any string contains the delimiter character (comma in this case), the decoding will break. For example:

* Input: `["a,b", "c"]`
* Encoded: `a,b,c,`
* Decoded: `["a", "b", "c"]` ❌ (Wrong!)

This approach cannot handle strings with delimiters in them.

***

## Optimized Approach: Length-Prefixed Encoding

### Idea

Instead of using a delimiter, encode the length of each string followed by a separator, then the string itself. This way, we know exactly how many characters belong to each string, making it impossible for content to interfere with the format.

### Key Insight

By storing the **length of each string** before the string itself, we can unambiguously decode any string, regardless of its content. The format becomes: `length#string` where `#` is a separator that guides our parsing, not a content delimiter.

### How it works:

1. **Encoding**: For each string, append its length, followed by `#`, followed by the string itself
2. **Decoding**: Parse the length until we hit `#`, then read exactly that many characters for the string
3. Repeat until we've decoded all strings

### C++ Implementation

```cpp theme={null}
class Solution {
public:
    string encode(vector<string>& strs) {
        string result = "";
        for(string str : strs)
            result += to_string(str.size()) + "#" + str;
        return result;
    }

    vector<string> decode(string s) {
        vector<string> result;
        int i = 0;
        while(i < s.size()) {
            int j = i;
            // Find the '#' that separates length from string
            while(s[j] != '#')
                j++;
            
            // Extract the length
            int len = stoi(s.substr(i, j - i));
            
            // Extract the string of that length
            string word = s.substr(j + 1, len);
            result.push_back(word);
            
            // Move to next string
            i = j + 1 + len;
        }
        return result;
    }
};
```

### Complexity Analysis

* **Time Complexity**: **O(n)** → Where n is the total number of characters. We process each character exactly once.
* **Space Complexity**: **O(n)** → For storing the encoded result (or decoded result in space-efficient implementation)

### Why it's optimal?

1. Works with any character in the string, including `#`, commas, or any special character
2. Linear time complexity - we never backtrack or revisit characters
3. Linear space complexity - only storing the output
4. Simple and elegant implementation
5. No edge cases that break the algorithm

***

## Comparison: Brute Force vs Optimized

| Approach                     | Time Complexity | Space Complexity | Pros                                             | Cons                           |
| ---------------------------- | --------------- | ---------------- | ------------------------------------------------ | ------------------------------ |
| **Brute Force (Delimiter)**  | O(n)            | O(n)             | Simple to understand                             | Fails with delimiter in string |
| **Length-Prefixed Encoding** | O(n)            | O(n)             | **Handles all characters**, Unambiguous decoding | Slightly more complex parsing  |

***

## Why Length-Prefixed Encoding is Most Optimal

1. **Universal Delimiter Independence** - No character can break the format since length dictates how many chars to read
2. **Linear Efficiency** - Single pass through the data for both encoding and decoding
3. **Handles All Edge Cases** - Empty strings, special characters, very long strings all work perfectly
4. **Deterministic Parsing** - No ambiguity; each byte of input maps to a unique output
5. **Industry Standard** - Used in protocols like gRPC, Protobuf, and network communication
6. **No External Dependencies** - Pure algorithm, no external libraries needed

***

## Example Walkthroughs

### Example 1: Simple Strings

**Input**: `["Hello", "World"]`

**Using Length-Prefixed Encoding:**

```
Encoding:
- "Hello" has length 5 → "5#Hello"
- "World" has length 5 → "5#World"
- Result: "5#Hello5#World"

Decoding "5#Hello5#World":
- i=0: Find '#' at j=1
  - Length = 5
  - Read 5 chars from j+1: "Hello"
  - i = 1 + 1 + 5 = 7
- i=7: Find '#' at j=8
  - Length = 5
  - Read 5 chars from j+1: "World"
  - i = 8 + 1 + 5 = 14 (end)
- Result: ["Hello", "World"] ✓
```

Strings successfully encoded and decoded!

### Example 2: Strings with Special Characters

**Input**: `["a#b", "c,d", "e"]`

**Using Length-Prefixed Encoding:**

```
Encoding:
- "a#b" has length 3 → "3#a#b"
- "c,d" has length 3 → "3#c,d"
- "e" has length 1 → "1#e"
- Result: "3#a#b3#c,d1#e"

Decoding "3#a#b3#c,d1#e":
- i=0: Find '#' at j=1
  - Length = 3
  - Read 3 chars: "a#b" (the # inside doesn't matter!)
  - i = 1 + 1 + 3 = 5
- i=5: Find '#' at j=6
  - Length = 3
  - Read 3 chars: "c,d" (the comma is just data)
  - i = 6 + 1 + 3 = 10
- i=10: Find '#' at j=11
  - Length = 1
  - Read 1 char: "e"
  - i = 11 + 1 + 1 = 13 (end)
- Result: ["a#b", "c,d", "e"] ✓
```

Notice how special characters don't break the decoding!

### Example 3: Empty Strings

**Input**: `["", "abc", ""]`

**Using Length-Prefixed Encoding:**

```
Encoding:
- "" has length 0 → "0#"
- "abc" has length 3 → "3#abc"
- "" has length 0 → "0#"
- Result: "0#3#abc0#"

Decoding "0#3#abc0#":
- i=0: Find '#' at j=1
  - Length = 0
  - Read 0 chars: ""
  - i = 1 + 1 + 0 = 2
- i=2: Find '#' at j=3
  - Length = 3
  - Read 3 chars: "abc"
  - i = 3 + 1 + 3 = 7
- i=7: Find '#' at j=8
  - Length = 0
  - Read 0 chars: ""
  - i = 8 + 1 + 0 = 9 (end)
- Result: ["", "abc", ""] ✓
```

Empty strings are handled gracefully!

***

## Key Takeaways

* **Length-Prefixed Encoding** is the standard solution for this problem, used in real-world protocols
* **Never rely solely on delimiters** when the delimited content is untrusted or arbitrary
* **Separating metadata (length) from content** allows safe serialization of any data
* **Linear time and space** is optimal for serialization problems
* **Test edge cases**: empty strings, special characters, very long strings
* **This pattern appears everywhere**: network protocols, binary serialization, and data encoding
