# Module 5 - Advanced Sorting Algorithms & The Heap Data Structure

# 1. Motivation: The Quest for the Ideal Sorting Algorithm

<span class="ng-star-inserted">In our study of advanced sorting algorithms, we have explored powerful techniques that offer significant performance improvements over basic O(n²) methods. However, these advanced algorithms often come with their own set of trade-offs. Let's recap two of the most prominent examples: Merge Sort and Quick Sort.</span>

#### **<span class="ng-star-inserted">A Recap of Trade-offs</span>**

**<span class="ng-star-inserted">Merge Sort</span>**

- **<span class="ng-star-inserted">Strength:</span>**<span class="ng-star-inserted"> Its greatest advantage is its </span>**<span class="ng-star-inserted">guaranteed Θ(n log n) performance</span>**<span class="ng-star-inserted">. This efficiency holds true for all cases—worst, average, and best—making it exceptionally reliable and predictable.</span>
- **<span class="ng-star-inserted">Weakness:</span>**<span class="ng-star-inserted"> It is not an </span><span class="ng-star-inserted">in-place</span><span class="ng-star-inserted"> algorithm. Merge Sort requires auxiliary memory proportional to the size of the input array, giving it a space complexity of </span>**<span class="ng-star-inserted">O(n)</span>**<span class="ng-star-inserted">. This can be a significant drawback when memory is limited.</span>

**<span class="ng-star-inserted">Quick Sort</span>**

- **<span class="ng-star-inserted">Strength:</span>**<span class="ng-star-inserted"> It is an </span>**<span class="ng-star-inserted">in-place</span>**<span class="ng-star-inserted"> algorithm, requiring only a small, logarithmic amount of space on the recursion stack (</span>**<span class="ng-star-inserted">O(log n)</span>**<span class="ng-star-inserted">). In the average case, it is often faster in practice than other O(n log n) algorithms due to its low constant factors.</span>
- **<span class="ng-star-inserted">Weakness:</span>**<span class="ng-star-inserted"> Its primary disadvantage is its worst-case performance. If the pivot selection is poor (e.g., on an already sorted array), its performance degrades significantly to a slow </span>**<span class="ng-star-inserted">Θ(n²)</span>**<span class="ng-star-inserted">, which is no better than basic sorting methods like Bubble Sort.</span>

#### **<span class="ng-star-inserted">The Key Question</span>**

<span class="ng-star-inserted">This analysis of trade-offs leads to a crucial question: </span>**<span class="ng-star-inserted">Is there an algorithm that offers the "best of both worlds"?</span>**<span class="ng-star-inserted"> Can we find a sorting method that combines:</span>

1. <span class="ng-star-inserted">The guaranteed </span>**<span class="ng-star-inserted">Θ(n log n)</span>**<span class="ng-star-inserted"> worst-case performance of Merge Sort.</span>
2. <span class="ng-star-inserted">The </span>**<span class="ng-star-inserted">O(1)</span>**<span class="ng-star-inserted"> space efficiency (in-place nature) of Quick Sort.</span>

#### **<span class="ng-star-inserted">The Answer: Heap Sort</span>**

<span class="ng-star-inserted">The answer is </span>**<span class="ng-star-inserted">yes</span>**<span class="ng-star-inserted">, and one of the most classic algorithms that achieves this powerful combination is </span>**<span class="ng-star-inserted">Heap Sort</span>**<span class="ng-star-inserted">. It stands as a testament to how the right choice of an underlying data structure can lead to an algorithm with an excellent performance profile.</span>

<span class="ng-star-inserted">To fully understand how Heap Sort achieves this, we must first dive into the data structure that powers it: the </span>**<span class="ng-star-inserted">Heap</span>**<span class="ng-star-inserted">. The following sub-modules will build our understanding from the ground up, starting with the basic concepts of trees and leading to the full implementation and analysis of Heap Sort.</span>

# 2. Prerequisite: An Introduction to the Tree Data Structure

<span class="ng-star-inserted">Before we can understand the Heap, we must first be familiar with its underlying structure: the Tree. In computer science, a </span>**<span class="ng-star-inserted">tree</span>**<span class="ng-star-inserted"> is a widely used data structure that simulates a hierarchical structure, with a set of connected nodes.</span>

#### **<span class="ng-star-inserted">Core Components</span>**

<span class="ng-star-inserted">Every tree is composed of a few fundamental components:</span>

- **<span class="ng-star-inserted">Node:</span>**<span class="ng-star-inserted"> The primary entity in a tree that contains data. It is also known as a "vertex".</span>
- **<span class="ng-star-inserted">Edge:</span>**<span class="ng-star-inserted"> A connection or link between two nodes.</span>
- **<span class="ng-star-inserted">Root:</span>**<span class="ng-star-inserted"> The topmost node in a tree. It is the only node that does not have a parent.</span>

<span class="ng-star-inserted">Consider the tree structure below:</span>

- <span class="ng-star-inserted">Nodes: {A, B, C, D, E}</span>
- <span class="ng-star-inserted">Edges: {(A-B), (A-C), (B-D), (B-E)}</span>
- <span class="ng-star-inserted">Root: Node A</span>

#### **<span class="ng-star-inserted">Hierarchical Terminology</span>**

<span class="ng-star-inserted">The relationships between nodes in a tree are described using terminology borrowed from family trees:</span>

- **<span class="ng-star-inserted">Parent:</span>**<span class="ng-star-inserted"> A node that is directly above another node and connected to it. </span><span class="ng-star-inserted">In our example, A is the parent of B and C.</span>
- **<span class="ng-star-inserted">Child:</span>**<span class="ng-star-inserted"> A node that is directly below another node and connected to it. </span><span class="ng-star-inserted">In our example, D and E are children of B.</span>
- **<span class="ng-star-inserted">Leaf:</span>**<span class="ng-star-inserted"> A node that has no children. </span><span class="ng-star-inserted">In our example, C, D, and E are leaf nodes.</span>

#### **<span class="ng-star-inserted">Structure and Depth</span>**

<span class="ng-star-inserted">We can measure the structure of a tree in several ways:</span>

- **<span class="ng-star-inserted">Level:</span>**<span class="ng-star-inserted"> The level of a node is defined by its distance from the root. The root is at </span>**<span class="ng-star-inserted">Level 0</span>**<span class="ng-star-inserted">. Its children are at Level 1, their children are at Level 2, and so on.</span>
- **<span class="ng-star-inserted">Depth/Height:</span>**<span class="ng-star-inserted"> The height of a tree is the number of edges on the longest path from the root to a leaf. </span><span class="ng-star-inserted">Our example tree has a height of 2.</span>

#### **<span class="ng-star-inserted">Focus on the Binary Tree</span>**

<span class="ng-star-inserted">While a node in a tree can have any number of children, our focus for understanding heaps will be on a specific type: the </span>**<span class="ng-star-inserted">Binary Tree</span>**<span class="ng-star-inserted">.</span>

<span class="ng-star-inserted">A </span>**<span class="ng-star-inserted">Binary Tree</span>**<span class="ng-star-inserted"> is a tree data structure in which each node has </span>**<span class="ng-star-inserted">at most two children</span>**<span class="ng-star-inserted">, which are referred to as the </span><span class="ng-star-inserted">left child</span><span class="ng-star-inserted"> and the </span><span class="ng-star-inserted">right child</span><span class="ng-star-inserted">.</span>

<span class="ng-star-inserted">The reason we focus on Binary Trees is that the </span>**<span class="ng-star-inserted">Heap</span>**<span class="ng-star-inserted"> data structure, which is the engine of Heap Sort, is a specialized type of Binary Tree. Mastering this concept is a fundamental step toward understanding heaps.</span>

# 3. The Heap Data Structure

<span class="ng-star-inserted">Now that we understand the concept of a binary tree, we can define a Heap. A </span>**<span class="ng-star-inserted">Heap</span>**<span class="ng-star-inserted"> is a specialized tree-based data structure that satisfies two specific properties.</span>

#### **<span class="ng-star-inserted">The Two Defining Properties of a Heap</span>**

<span class="ng-star-inserted">For a binary tree to be considered a Heap, it must adhere to the following rules:</span>

1. **<span class="ng-star-inserted">Structure Property: It must be an </span><span class="ng-star-inserted">Essentially Complete Binary Tree</span><span class="ng-star-inserted">.</span>**  
    <span class="ng-star-inserted">This means that the tree is completely filled on all levels, with the possible exception of the last level, which must be filled from </span>**<span class="ng-star-inserted">left to right</span>**<span class="ng-star-inserted"> without any gaps. This "no gaps" property is crucial, as it allows a heap to be stored efficiently in an array.</span>
2. **<span class="ng-star-inserted">Heap Property (Order Property):</span>**<span class="ng-star-inserted"> The nodes must be ordered in a specific way relative to their children. This ordering defines the type of heap.</span>

#### **<span class="ng-star-inserted">Types of Heaps</span>**

<span class="ng-star-inserted">There are two main types of heaps, distinguished by the Heap Property they enforce:</span>

- **<span class="ng-star-inserted">Max-Heap:</span>**<span class="ng-star-inserted"> The value of every node is </span>**<span class="ng-star-inserted">greater than or equal to</span>**<span class="ng-star-inserted"> the value of each of its children.</span>
    
    
    - **<span class="ng-star-inserted">Implication:</span>**<span class="ng-star-inserted"> In a Max-Heap, the </span>**<span class="ng-star-inserted">largest element</span>**<span class="ng-star-inserted"> in the entire collection is always located at the </span>**<span class="ng-star-inserted">root</span>**<span class="ng-star-inserted"> of the tree. This is the type of heap used for Heap Sort.</span>
    - <span class="ng-star-inserted">Example: A tree with root 10 and children 8 and 9 is a valid Max-Heap at the top level.</span>
- **<span class="ng-star-inserted">Min-Heap:</span>**<span class="ng-star-inserted"> The value of every node is </span>**<span class="ng-star-inserted">less than or equal to</span>**<span class="ng-star-inserted"> the value of each of its children.</span>
    
    
    - **<span class="ng-star-inserted">Implication:</span>**<span class="ng-star-inserted"> In a Min-Heap, the </span>**<span class="ng-star-inserted">smallest element</span>**<span class="ng-star-inserted"> is always located at the </span>**<span class="ng-star-inserted">root</span>**<span class="ng-star-inserted">. This structure is commonly used to implement Priority Queues.</span>
    - <span class="ng-star-inserted">Example: A tree with root 2 and children 5 and 3 is a valid Min-Heap at the top level.</span>

#### **<span class="ng-star-inserted">The Array Representation</span>**

<span class="ng-star-inserted">The </span><span class="ng-star-inserted">complete binary tree</span><span class="ng-star-inserted"> property is precisely what makes an array a perfect and highly efficient way to store a heap. The hierarchical relationships are not stored with pointers, but are instead calculated mathematically based on an element's index.</span>

<span class="ng-star-inserted">For an element at a </span>**<span class="ng-star-inserted">zero-based index </span><span class="inline-code ng-star-inserted">i</span>**<span class="ng-star-inserted"> in an array representing a heap:</span>

- <span class="ng-star-inserted">The index of its </span>**<span class="ng-star-inserted">parent</span>**<span class="ng-star-inserted"> is calculated as: </span><span class="inline-code ng-star-inserted">floor((i - 1) / 2)</span>
- <span class="ng-star-inserted">The index of its </span>**<span class="ng-star-inserted">left child</span>**<span class="ng-star-inserted"> is calculated as: </span><span class="inline-code ng-star-inserted">2 \* i + 1</span>
- <span class="ng-star-inserted">The index of its </span>**<span class="ng-star-inserted">right child</span>**<span class="ng-star-inserted"> is calculated as: </span><span class="inline-code ng-star-inserted">2 \* i + 2</span>

<span class="ng-star-inserted">For example, for the node at index <span class="inline-code ng-star-inserted">i = 3</span>:</span>

- <span class="ng-star-inserted">Its parent is at index <span class="inline-code ng-star-inserted">floor((3 - 1) / 2) = floor(1) = 1</span>.</span>
- <span class="ng-star-inserted">Its left child would be at index <span class="inline-code ng-star-inserted">2 \* 3 + 1 = 7</span>.</span>

# 4. Core Operations and the Heap Sort Algorithm

<span class="ng-star-inserted">The Heap Sort algorithm is a two-phase process that masterfully uses the properties of the Max-Heap. Both phases rely on a core "helper" operation that maintains the heap property.</span>

#### **<span class="ng-star-inserted">The Engine of the Heap: The </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> Operation</span>**

<span class="ng-star-inserted">To build and maintain a heap, we need a procedure to fix any violations of the heap property. The primary operation used in Heap Sort is </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> (also known as </span><span class="ng-star-inserted">heapify-down</span><span class="ng-star-inserted">).</span>

- **<span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> (or Heapify-Down):</span>**<span class="ng-star-inserted"> This operation is used when a node's value is </span>**<span class="ng-star-inserted">smaller than</span>**<span class="ng-star-inserted"> one of its children, violating the Max-Heap property. The </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> process corrects this by repeatedly swapping the parent with its </span>**<span class="ng-star-inserted">largest child</span>**<span class="ng-star-inserted">, effectively "sinking" the smaller element down the tree until the heap property is restored for that subtree. The time complexity for a single </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> operation is proportional to the height of the tree, making it </span>**<span class="ng-star-inserted">O(log n)</span>**<span class="ng-star-inserted">.</span>
- **<span class="inline-code ng-star-inserted">siftup</span><span class="ng-star-inserted"> (or Heapify-Up):</span>**<span class="ng-star-inserted"> While less critical for our Heap Sort implementation, this complementary operation is used when a newly inserted node is </span>**<span class="ng-star-inserted">larger than</span>**<span class="ng-star-inserted"> its parent. It "bubbles" the element up the tree by swapping it with its parent until the heap property is restored. This is the key operation used when adding elements to a </span><span class="inline-code ng-star-inserted">std::priority\_queue</span><span class="ng-star-inserted">.</span>

<span class="ng-star-inserted">With the </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> operation as our main tool, we can now construct the two phases of Heap Sort.</span>

#### **<span class="ng-star-inserted">Phase 1: </span><span class="inline-code ng-star-inserted">makeheap</span><span class="ng-star-inserted"> (The Heapify Process)</span>**

<span class="ng-star-inserted">The first step is to convert the unsorted input array into a valid Max-Heap.</span>

- **<span class="ng-star-inserted">Goal:</span>**<span class="ng-star-inserted"> Rearrange the elements of the array so that they satisfy the Max-Heap property.</span>
- **<span class="ng-star-inserted">Method (Bottom-Up):</span>**<span class="ng-star-inserted"> The most efficient way to do this is with a </span><span class="ng-star-inserted">bottom-up</span><span class="ng-star-inserted"> approach. We treat the entire array as a complete binary tree and then iteratively fix it. The process starts from the </span>**<span class="ng-star-inserted">last non-leaf node</span>**<span class="ng-star-inserted"> and moves upwards towards the root. For each node, we call </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> to ensure its subtree is a valid Max-Heap. By the time we reach the root, the entire array is guaranteed to be a Max-Heap.</span>
- **<span class="ng-star-inserted">Analysis:</span>**<span class="ng-star-inserted"> While it involves multiple calls to </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> (an O(log n) operation), a tight analysis shows that the </span><span class="inline-code ng-star-inserted">makeheap</span><span class="ng-star-inserted"> phase can be completed in </span>**<span class="ng-star-inserted">O(n) linear time</span>**<span class="ng-star-inserted">, which is remarkably efficient.</span>

#### **<span class="ng-star-inserted">Phase 2: The Sorting Process</span>**

<span class="ng-star-inserted">Once the array is a Max-Heap, the largest element is at the root (</span><span class="inline-code ng-star-inserted">array\[0\]</span><span class="ng-star-inserted">). The sorting phase systematically extracts this largest element and places it in its correct final position.</span>

<span class="ng-star-inserted">This is done through a repeated process:</span>

1. **<span class="ng-star-inserted">Swap:</span>**<span class="ng-star-inserted"> Swap the root element (</span><span class="inline-code ng-star-inserted">array\[0\]</span><span class="ng-star-inserted">, the current maximum) with the </span>**<span class="ng-star-inserted">last element</span>**<span class="ng-star-inserted"> in the heap portion of the array. The largest element is now in its final, sorted position at the end of the array.</span>
2. **<span class="ng-star-inserted">Shrink:</span>**<span class="ng-star-inserted"> The effective size of the heap is reduced by one, "locking in" the sorted element at the end so it is no longer considered part of the heap.</span>
3. **<span class="ng-star-inserted">Repair:</span>**<span class="ng-star-inserted"> The new root element (which was previously the last element) likely violates the Max-Heap property. Call </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted"> on the root (</span><span class="inline-code ng-star-inserted">array\[0\]</span><span class="ng-star-inserted">) to repair the heap, ensuring the next largest element rises to the top.</span>

<span class="ng-star-inserted">This cycle is repeated </span><span class="inline-code ng-star-inserted">n-1</span><span class="ng-star-inserted"> times, until the entire array is sorted.</span>

#### **<span class="ng-star-inserted">Complexity Analysis of Heap Sort</span>**

- **<span class="ng-star-inserted">Time Complexity:</span>**
    
    
    - <span class="ng-star-inserted">Phase 1 (</span><span class="inline-code ng-star-inserted">makeheap</span><span class="ng-star-inserted">): </span>**<span class="ng-star-inserted">O(n)</span>**
    - <span class="ng-star-inserted">Phase 2 (Sorting): Consists of </span><span class="inline-code ng-star-inserted">n-1</span><span class="ng-star-inserted"> calls to </span><span class="inline-code ng-star-inserted">siftdown</span><span class="ng-star-inserted">, each taking </span>**<span class="ng-star-inserted">O(log n)</span>**<span class="ng-star-inserted"> time. Total time for this phase is </span>**<span class="ng-star-inserted">O(n log n)</span>**<span class="ng-star-inserted">.</span>
    - <span class="ng-star-inserted">The overall complexity is dominated by the sorting phase, giving Heap Sort a guaranteed </span>**<span class="ng-star-inserted">Θ(n log n)</span>**<span class="ng-star-inserted"> time complexity for worst-case, average-case, and best-case scenarios.</span>
- **<span class="ng-star-inserted">Space Complexity:</span>**
    
    
    - <span class="ng-star-inserted">The algorithm operates directly on the input array, swapping elements within it. It requires no significant extra storage. Therefore, Heap Sort is an </span>**<span class="ng-star-inserted">in-place</span>**<span class="ng-star-inserted"> algorithm with a space complexity of </span>**<span class="ng-star-inserted">O(1)</span>**<span class="ng-star-inserted">.</span>

# 5. The Heap in Practice: The C++ Standard Template Library (STL)

<span class="ng-star-inserted">While understanding how to build Heap Sort from scratch is crucial for algorithmic knowledge, in modern C++, we often leverage the powerful abstractions provided by the Standard Template Library (STL). The concepts of a heap are primarily exposed through </span><span class="inline-code ng-star-inserted">std::priority\_queue</span><span class="ng-star-inserted">.</span>

#### **<span class="inline-code ng-star-inserted">std::priority\_queue</span><span class="ng-star-inserted">: A Ready-to-Use Heap</span>**

<span class="ng-star-inserted">The C++ STL provides a container adapter called </span><span class="inline-code ng-star-inserted">std::priority\_queue</span><span class="ng-star-inserted"> that is an implementation of a Heap.</span>

- **<span class="ng-star-inserted">Default Behavior:</span>**<span class="ng-star-inserted"> By default, </span><span class="inline-code ng-star-inserted">std::priority\_queue&lt;int&gt;</span><span class="ng-star-inserted"> is a </span>**<span class="ng-star-inserted">Max-Heap</span>**<span class="ng-star-inserted">. This means that when you call </span><span class="inline-code ng-star-inserted">top()</span><span class="ng-star-inserted">, it returns the largest element, and when you </span><span class="inline-code ng-star-inserted">pop()</span><span class="ng-star-inserted">, it removes the largest element while maintaining the heap property.</span>
- **<span class="ng-star-inserted">Core Operations:</span>**
    
    
    - <span class="inline-code ng-star-inserted">push()</span><span class="ng-star-inserted">: Adds an element to the queue (heap). This internally performs a </span><span class="ng-star-inserted">sift-up</span><span class="ng-star-inserted"> operation. Complexity: </span>**<span class="ng-star-inserted">O(log n)</span>**<span class="ng-star-inserted">.</span>
    - <span class="inline-code ng-star-inserted">pop()</span><span class="ng-star-inserted">: Removes the top element. Complexity: </span>**<span class="ng-star-inserted">O(log n)</span>**<span class="ng-star-inserted">.</span>
    - <span class="inline-code ng-star-inserted">top()</span><span class="ng-star-inserted">: Returns a reference to the top element without removing it. Complexity: </span>**<span class="ng-star-inserted">O(1)</span>**<span class="ng-star-inserted">.</span>

#### **<span class="ng-star-inserted">Working with Custom Data Types</span>**

<span class="ng-star-inserted">What happens if we want to create a priority queue of custom objects, like a </span><span class="inline-code ng-star-inserted">struct</span><span class="ng-star-inserted"> for patients in a hospital?</span>

```c++
struct Patient {
    std::string name;
    int triage_level; // Level 1 is highest priority
};

// This will cause a compile error!
std::priority_queue<Patient> er_queue; 
```

<span class="ng-star-inserted">The compiler doesn't know how to compare two </span><span class="inline-code ng-star-inserted">Patient</span><span class="ng-star-inserted"> objects. To solve this, we must provide our own custom comparison logic.</span>

#### **<span class="ng-star-inserted">Custom Comparators: Functors and Lambda Expressions</span>**

<span class="ng-star-inserted">We can tell </span><span class="inline-code ng-star-inserted">std::priority\_queue</span><span class="ng-star-inserted"> how to order our custom types using a </span>**<span class="ng-star-inserted">custom comparator</span>**<span class="ng-star-inserted">. There are two common ways to do this in modern C++:</span>

1. **<span class="ng-star-inserted">Functor (Function Object):</span>**<span class="ng-star-inserted"> A </span><span class="inline-code ng-star-inserted">struct</span><span class="ng-star-inserted"> or </span><span class="inline-code ng-star-inserted">class</span><span class="ng-star-inserted"> that overloads the function call operator </span><span class="inline-code ng-star-inserted">operator()</span><span class="ng-star-inserted">. The </span><span class="inline-code ng-star-inserted">priority\_queue</span><span class="ng-star-inserted"> will create an object of this type and use its </span><span class="inline-code ng-star-inserted">operator()</span><span class="ng-star-inserted"> to compare elements. This is a powerful, stateful way to define comparison logic.  
    </span>
    
    ```c++
    struct ComparePatients {
        bool operator()(const Patient& a, const Patient& b) {
            // Because priority_queue is a Max-Heap, it puts the "larger"
            // element on top. We want level 1 to be the highest priority,
            // so we tell the queue that 'a' is "less than" 'b' if its
            // triage level is numerically greater.
            return a.triage_level > b.triage_level;
        }
    };
    
    // Usage:
    std::priority_queue<Patient, std::vector<Patient>, ComparePatients> er_queue;
    ```
2. <span class="ng-star-inserted">**Lambda Expression:** An inline, anonymous function that can be defined on the spot. Lambdas are often more concise and readable for simple, stateless comparison logic. They are commonly used with algorithms like <span class="inline-code ng-star-inserted">std::sort</span>.  
    </span>```c++
    auto compare_lambda = [](const Patient& a, const Patient& b) {
        return a.triage_level > b.triage_level;
    };
    
    // Usage with priority_queue is slightly more verbose
    std::priority_queue<Patient, std::vector<Patient>, decltype(compare_lambda)> er_queue(compare_lambda);
    ```

<span class="ng-star-inserted">By providing these comparators, we can leverage the highly optimized and safe implementation of a heap provided by the STL for any data type we need.</span>