Heap

Heap Examples
class BinaryHeap {
  constructor () {
    this.heap = []
  }

  insert (value) {
    this.heap.push(value)
    this.heapify()
  }

  size () {
    return this.heap.length
  }

  empty () {
    return this.size() === 0
  }

  // using iterative approach to reorder the heap after insertion
  heapify () {
    let index = this.size() - 1

    while (index > 0) {
      const element = this.heap[index]
      const parentIndex = Math.floor((index - 1) / 2)
      const parent = this.heap[parentIndex]

      if (parent[0] >= element[0]) break
      this.heap[index] = parent
      this.heap[parentIndex] = element
      index = parentIndex
    }
  }

  // Extracting the maximum element from the Heap
  extractMax () {
    const max = this.heap[0]
    const tmp = this.heap.pop()
    if (!this.empty()) {
      this.heap[0] = tmp
      this.sinkDown(0)
    }
    return max
  }

  // To restore the balance of the heap after extraction.
  sinkDown (index) {
    const left = 2 * index + 1
    const right = 2 * index + 2
    let largest = index
    const length = this.size()

    if (left < length && this.heap[left][0] > this.heap[largest][0]) {
      largest = left
    }
    if (right < length && this.heap[right][0] > this.heap[largest][0]) {
      largest = right
    }
    // swap
    if (largest !== index) {
      const tmp = this.heap[largest]
      this.heap[largest] = this.heap[index]
      this.heap[index] = tmp
      this.sinkDown(largest)
    }
  }
}

const maxHeap = new BinaryHeap()
maxHeap.insert([4])
maxHeap.insert([3])
maxHeap.insert([6])
maxHeap.insert([1])
maxHeap.insert([8])
maxHeap.insert([2])

while (!maxHeap.empty()) {
  const mx = maxHeap.extractMax()
  console.log(mx)
}

Overview of heap

2. Representation

In a min heap, when you look at the parent node and its child nodes, the parent node always has the smallest value. When a heap has an opposite definition, we call it a max heap. For the following discussions, we call a min heap a heap.

You can access a parent node or a child nodes in the array with indices below.

  • A root node|i = 1, the first item of the array

  • A parent node|parent(i) = i / 2

  • A left child node|left(i) = 2i

  • A right child node|right(i)=2i+1

The parent node corresponds to the item of index 2 by parent(i) = 4 / 2 = 2. The child nodes correspond to the items of index 8 and 9 by left(i) = 2 * 2 = 4, right(i) = 2 * 2 + 1 = 5, respectively.

3. The way how to build a heap

You need two operations to build a heap from an arbitrary array.

  1. min_heapify|make some node and its descendant nodes meet the heap property.

  2. build_min_heap|produce a heap from an arbitrary array.

We can build a heap by applying min_heapify to each node repeatedly.

3.1 min_heapify

In min_heapify, we exchange some nodes with its child nodes to satisfy the heap property under these two features below;

  1. Some node and its child nodes don’t satisfy the heap property,

  2. That child nodes and its descendant nodes satisfy the property.

Here we define min_heapify(array, index). This method takes two arguments, array, and index. We assume this method exchange the node of array[index] with its child nodes to satisfy the heap property.

Now, this subtree satisfies the heap property by exchanging the node of index 4 with the node of index 8.

These operations above produce the heap from the unordered tree (the array).

3.2 build_min_heap

The pseudo-code below stands for how build_min_heap works.

build_min_heap(array)
    for i=n/2 downto 1
        do min_heapify(array, i)

Each node can satisfy the heap property with meeting the conditions to be able to apply min_heapfiy. This is because this function iterates the nodes from the bottom (the second last level) to the top (the root node level). For instance, this function first applies min_heapify to the nodes both of index 4 and index 5 and then applying min_heapify to the node of index 2. So the node of the index and its descendent nodes satisfy the heap property when applying min_heapify.

4. Time complexity

Let’s think about the time complexity of build_min_heap. First of all, we think the time complexity of min_heapify, which is a main part of build_min_heap.

min_heapify repeats the operation of exchanging the items in an array, which runs in constant time. So the time complexity of min_heapify will be in proportional to the number of repeating. In the worst case, min_heapify should repeat the operation the height of the tree times. This is because in the worst case, min_heapify will exchange the root nodes with the most depth leaf node. Assuming h as the height of the root node, the time complexity of min_heapify will take O(h) time.

So we should know the height of the tree to get the time complexity.

Finally, we get O(n) as the time complexity of build_min_heap. Also, we get O(logn) as the time complexity of min_heapify.

5. Implementation

Here we implement min_heapify and build_min_heap with Python. the implementation of min_heapify will be as follow.

def min_heapify(array, i):
    left = 2 * i + 1
    right = 2 * i + 2
    length = len(array) - 1
    smallest = i    if left <= length and array[i] > array[left]:
        smallest = left
    if right <= length and array[smallest] > array[right]:
        smallest = right
    if smallest != i:
        array[i], array[smallest] = array[smallest], array[i]
        min_heapify(array, smallest)

First, this method computes the node of the smallest value among the node of index i and its child nodes and then exchange the node of the smallest value with the node of index i. When the exchange happens, this method applies min_heapify to the node exchanged.

Index of a list (an array) in Python starts from 0, the way to access the nodes will change as follow.

  • The root node|i = 0

  • The parent node|parent(i) = (i-1) / 2

  • The left child node|left(i) = 2i + 1

  • The right child node|right(i)=2i+2

The variable, smallest has the index of the node of the smallest value. If the smallest doesn’t equal to the i, which means this subtree doesn’t satisfy the heap property, this method exchanges the nodes and executes min_heapify to the node of the smallest.

The implementation of build_min_heap is almost the same as the pseudo-code.

def build_min_heap(array):
    for i in reversed(range(len(array)//2)):
        min_heapify(array, i)

The for-loop differs from the pseudo-code, but the behavior is the same. This for-loop also iterates the nodes from the second last level of nodes to the root nodes.

6. Heapsort

Heapsort is one sort algorithm with a heap. It’s really easy to implement it with min_heapify and build_min_heap. The flow of sort will be as follow. Please note that the order of sort is ascending.

  1. Build a heap from an arbitrary array with build_min_heap.

  2. Swap the first item with the last item in the array.

  3. Remove the last item from the array.

  4. Run min_heapify to the first item.

  5. Back to step 2.

In a heap, the smallest item is the first item of an array. The array after step 3 satisfies the conditions to apply min_heapify because we remove the last item after we swap the first item with the last item. By this nature, we can sort an array by repeating steps 2 to 4.

The implementation of heapsort will become as follow.

def heapsort(array):
    array = array.copy()
    build_min_heap(array)    sorted_array = []
    for _ in range(len(array)):
        array[0], array[-1] = array[-1], array[0]
        sorted_array.append(array.pop())
        min_heapify(array, 0)    return sorted_array

The time complexity of heapsort is O(n_log_n) because in the worst case, we should repeat min_heapify the number of items in array times, which is n.

In the heapq module of Python, it has already implemented some operation for a heap. I followed the method in MIT’s lecture, the implementation differs from Python’s. If you’d like to know Python’s detail implementation, please visit the source code here. For example, these methods are implemented in Python.

  • heapq.heapify | corresponds to build_min_heap

  • heapq.heapop | corresponds to swapping items, remove the last item, and min_heapify at once.

By using those methods above, we can implement heapsort as follow. Please note that it differs from the implementation of heapsort in the official documents.

import heapqdef heapsort(array):
    h = array.copy()
    heapq.heapify(h)
    return [heapq.heappop(h) for _ in range(len(array))]

So that’s all for this post. Thank you for reading!

References

Implementation

"""
Heaps (priority queues)
"""
# the maximum number of items that can be stored in the heap
CAPACITY = 10

"""
*** Max Heap ***
----------------
"""


# define the heap class
class Heap(object):

    def __init__(self):
        # create array with as many slots as the CAPACITY
        self.heap = [0] * CAPACITY
        # track the size of the heap (the number of items in the heap)
        self.heap_size = 0

    # insertion takes O(1) running time BUT we have to make sure that hte
    # heap properties are not violated (it takes O(logN) because of the
    # fixUp() method)
    def insert(self, item):
        # if the heap is at CAPACITY already we can not insert any more items
        if CAPACITY == self.heap_size:
            return

        # insert the item at the index of the size of the heap (the last
        # empty spot) and then increment the counter
        self.heap[self.heap_size] = item
        self.heap_size += 1

        # after insert check to see if the heap properties were violated and
        # if so fix them
        self.fix_up(self.heap_size - 1)

    # we consider the last item and check whether swaps are needed or not
    # running time O(logN)
    def fix_up(self, index):

        # get the parent index of the given node in the heap
        parent_index = (index - 1) // 2

        # while the index > 0 means until we consider all the items "above"
        # the one we inserted we have to swap the node with the parent if the
        # heap property is violated
        # this is a MAX HEAP: largest items are in the higher layers (max
        # item == root node)
        if index > 0 and self.heap[index] > self.heap[parent_index]:
            self.swap(index, parent_index)
            # run the check again after the swap on the parent
            self.fix_up(parent_index)

    # Get max, return the root node.  Because this is a max heap the root is
    # the max item.  Because this is an array it takes O(1) time
    # this is the peek() method
    def get_max(self):
        return self.heap[0]

    # Get poll, returns the max item and also REMOVES the item from the heap
    # note: we just dont care about that item anymore but because we have an
    # array with fixed size we aren't able to get rid of it completely
    # O(logN) running time
    def poll(self):

        max = self.get_max()

        # first swap the first item with the last item
        self.swap(0, self.heap_size - 1)
        # then decrement the heap size ( excludes the last item from the heap
        # going forward thus 'removing it')
        self.heap_size = self.heap_size - 1

        # nex check if the heap properties have been violated and if so fix
        # them ( fix down is similar to fix up but works from the root down )
        self.fix_down(0)

        # finally return the max item removed
        return max

    # fix down, we have a given item in the heap and we consider all the
    # items below and check whether the heap properties are violated or not
    def fix_down(self, index):

        # every node has 2 children so in the array the node i has left child
        # with index *i+1 and right child with index 2*i+2
        index_left = 2 * index + 1
        index_right = 2 * index + 2
        # this is a max heap so the parent is always greater than the children
        index_largest = index

        # if the left child is greater than the parent: largest is the left node
        if index_left < self.heap_size and self.heap[index_left] > self.heap[
            index]:
            index_largest = index_left

        # figure out if the left child or right child is the greater one
        # first check if the given index is valid ( not larger than the heap
        # size)
        # if the right child is greater than the left child: largest is the
        # right node
        if index_right < self.heap_size and self.heap[index_right] > \
                self.heap[index_largest]:
            index_largest = index_right

        # we don't want to swap items with themselves
        if index != index_largest:
            self.swap(index, index_largest)
            # recursively check down the tree for any other heap violations
            # and fix them as needed
            self.fix_down(index_largest)

    # we have N items and we want to sort them with a heap
    # every poll operation takes O(logN) time because of the fix down
    # method thats why the overall running time is O(NlogN) for heapsort
    def heap_sort(self):

        # we decrease the size of hte heap in the poll method so we have to
        # store it
        size = self.heap_size

        for i in range(0, size):
            max = self.poll()
            print(max)

    # swap two items with (index1, index2) in the heap array
    def swap(self, index1, index2):
        self.heap[index2], self.heap[index1] = self.heap[index1], self.heap[index2]

heap = Heap()
heap.insert(10)
heap.insert(8)
heap.insert(12)
heap.insert(20)
heap.insert(-2)
heap.insert(0)
heap.insert(1)
heap.insert(321)

heap.heap_sort()
# Implements a min-heap. For max-heap, simply reverse all comparison orders.
#
# Note on alternate subroutine namings (used in some textbooks):
#     - _bubble_up = siftdown
#     - _bubble_down = siftup

def _bubble_up(heap, i):
    while i > 0:
        parent_i = (i - 1) // 2
        if heap[i] < heap[parent_i]:
            heap[i], heap[parent_i] = heap[parent_i], heap[i]
            i = parent_i
            continue
        break

def _bubble_down(heap, i):
    startpos = i
    newitem = heap[i]
    left_i = 2 * i + 1
    while left_i < len(heap):
        # Pick the smaller of the L and R children
        right_i = left_i + 1
        if right_i < len(heap) and not heap[left_i] < heap[right_i]:
            child_i = right_i
        else:
            child_i = left_i

        # Break if heap invariant satisfied
        if heap[i] < heap[child_i]:
            break

        # Move the smaller child up.
        heap[i], heap[child_i] = heap[child_i], heap[i]
        i = child_i
        left_i = 2 * i + 1

def heapify(lst):
    for i in reversed(range(len(lst) // 2)):
        _bubble_down(lst, i)

def heappush(heap, item):
    heap.append(item)
    _bubble_up(heap, len(heap) - 1)

def heappop(heap):
    if len(heap) == 1:
        return heap.pop()
    min_value = heap[0]
    heap[0] = heap[-1]
    del heap[-1]
    _bubble_down(heap, 0)
    return min_value



# Example usage
heap = [3, 2, 1, 0]
heapify(heap)
print('Heap(0, 1, 2, 3):', heap)
heappush(heap, 4)
heappush(heap, 7)
heappush(heap, 6)
heappush(heap, 5)
print('Heap(0, 1, 2, 3, 4, 5, 6, 7):', heap)

sorted_list = [heappop(heap) for _ in range(8)]
print('Heap-sorted list:', sorted_list)

# Large test case, for randomized tests
import random

# Heapify 0 ~ 99
heap = list(range(100))
random.shuffle(heap)
heapify(heap)

# Push 100 ~ 199 in random order
new_elems = list(range(100, 200))
random.shuffle(new_elems)
for elem in new_elems:
    heappush(heap, elem)

sorted_list = [heappop(heap) for _ in range(200)]
print(sorted_list == sorted(sorted_list))
ArrayBinary Search TreeLinked ListExtra-ArrayStackBinary TreeRecursionHash TableSearchingSortingQueue SandboxHash TableDouble Linked ListGraphsExoticHeap

Last updated