D3-Module 03 - Hash Tables II
Number Bases & Characters
Original:
Objective 01 - Understand hash collisions and use a linked list for collision resolution in a user-defined Hashable class
Overview
Remember when we wondered what would happen if multiple keys hashed to the same index, and we said that we would worry about it later? Whelp, it's later 🤪.
Let's say we were given the key-value pair ("Ryan", 10). Our hash code then maps "Ryan" to index 3. Excellent, that works!Now let's say after we inserted ("Ryan", 10), we have to insert ("Parth", 12). Our hash code maps "Parth" to index 3. Uh oh! Ryan is already there! What do we do?? 😱
Ok, let's stop freaking out, and let's think about this. If we don't do anything, the value stored at index 3 will just get overwritten. Meaning if we try to retrieve the value associated with "Ryan", 12 will be returned instead of 10. That might not seem like a big deal, but what if we were returning passwords based on a user ID, and we returned someone else's password. That would be horrible.
Let's fix this problem. The most common way to solve this is with chaining. If we see multiple values hashed to an index, we will chain them in a some data structure that can hold multiple items. In our case, we'll use Python's list type, but a more typical solution would use a linked list. We'll cover linked lists in a future module.

Ok, sounds ideal? But how does this work in code? Let's write some of it together.
Follow Along
Below is a partially filled out hash table class where we will be using HashTableEntry as our chain entries.
Take a look at the code below.
class HashTableEntry:
    """
    Hash table key/value pair to go in our collision chain
    """
    def __init__(self, key, value):
        self.key = key
        self.value = value
# Hash table can't have fewer than this many slots
MIN_CAPACITY = 8
class HashTable:
    """
    A hash table with `capacity` buckets
    that accepts string keys
    Implement this.
    """
    def __init__(self, capacity):
        self.capacity = capacity  # Number of buckets in the hash table
        self.storage = []
        for _ in range(capacity):   # Initialize with empty lists
            self.storage.append([])
        self.item_count = 0
    def get_num_slots(self):
        """
        Return the length of the list you're using to hold the hash table data. (Not the number of items stored in the hash table,
        but the number of slots in the main list.)
        One of the tests relies on this.
        Implement this.
        """
        # Your code here
    def get_load_factor(self):
        """
        Return the load factor for this hash table.
        Implement this.
        """
        return len(self.storage)
    def djb2(self, key):
        """
        DJB2 hash, 32-bit
        Implement this, and/or FNV-1.
        """
        str_key = str(key).encode()
        hash = FNV_offset_basis_64
        for b in str_key:
            hash *= FNV_prime_64
            hash ^= b
            hash &= 0xffffffffffffffff  # 64-bit hash
        return hash
    def hash_index(self, key):
        """
        Take an arbitrary key and return a valid integer index between within the hash table's storage capacity.
        """
        return self.djb2(key) % self.capacity
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        # Your code here
    def delete(self, key):
        """
        Remove the value stored with the given key.
        Print a warning if the key is not found.
        Implement this.
        """
        # Your code here
    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        Implement this.
        """
        # Your code hereLet's implement the put method with collision resolution by chaining. What are the two cases we need to handle?
- There are no entries at the index. Great! We can initialize the entry to a list with the new - HashTableEntryin it.
- There are multiple entries at the index. We need to check every entry in the chain. If the key in one of the entries is equal to the key we are passing in, we need to replace it. For instance, let's say we pass in - ("Ryan", 12),and then we later pass in- ("Ryan", 15). We would need to replace "Ryan"'s old value with 15. If there are no entries that match, we create a new entry at the end of the chain.
Ok, that might sound confusing. Let's start breaking it down into code.
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        # Your code hereFirst, we need to hash the key and start with the first entry at that index.
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
        chain = self.storage[index]Next, we need to go through the chain. We need to check two conditions:
- The current entry is not empty. 
- The key or the current entry is not equal to the key we are passing in. 
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
        chain = self.storage[index]
        existing_entry = None
        for current_entry in chain:
            if current_entry.key == key:
                exiting_entry = current_entry
                breakSweet! Now we need to check what happens when the loop breaks. It would only break for two reasons:
- We reached an entry with the same key and need to replace the value. 
- We reached the end of the chain and need to create a new entry. 
Let's write that in code!
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
        chain = self.storage[index]
        existing_entry = None
        for current_entry in chain:
            if current_entry.key == key:
                existing_entry = current_entry
                break
        if existing_entry is not None:
            existing_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            chain.append(new_entry)Great! We created the put method.
Challenge
https://replit.com/@bgoonz/cs-unit-1-sprint-4-module-2-hash-table-collision-resolution#main.py

Objective 02 - Define and compute the load factor of a hash table and implement a hash table that automatically resizes based on load factor
Overview
What does runtime look like with linked list chaining?
The performance of hash tables for search, insertion, and deletion is constant time (O(1)) in the average case. However, as the chains get longer and longer, in the worst case, those same operations are done in linear time (O(n)). The more collisions that your hash table has, the less performant the hash table is. To avoid collisions, a proper hash function and maintaining a low load factor is crucial. What is a load factor?
Load Factor
The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.

Hash tables use an array for storage. So, the load factor is the number of occupied slots divided by the length of the array. So, an array of length 10 with three items in it has a load factor of 0.3, and an array of length 20 with twenty items has a load factor of 1. If you use linear probing for collision resolution, then the maximum load factor is 1. If you use chaining for collision resolution, then the load factor can be greater than 1.
As the load factor of your hash table increases, so does the likelihood of a collision, which reduces your hash table's performance. Therefore, you need to monitor the load factor and resize your hash table when the load factor gets too large. The general rule of thumb is to resize your hash table when your load factor is greater than 0.7. Also, when you resize, it is common to double the size of the hash table. When you resize the array, you need to re-insert all of the items into this new hash table. You cannot simply copy the old items into the new hash table. Each item has to be rerun through the hashing function because the hashing function considers the size of the hash table when determining the index that it returns.
You can see that resizing is an expensive operation, so you don’t want to resize too often. However, when we average it out, hash tables are constant time (O(1)) even with resizing.
The load factor can also be too small. If the hash table is too large for the data that it is storing, then memory is being wasted. So, in addition to resizing, when the load factor gets too high, you should also resize when the load factor gets too low.
One way to know when to resize your hash table is to compute the load factor whenever an item is inserted or deleted into the hash table. If the load factor is too high or too low, then you need to resize.
We added a get_load_factor and resize method to calculate the load factor and resize the hash table with a new capacity when necessary.
class HashTableEntry:
    """
    Linked List hash table key/value pair
    """
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None
# Hash table can't have fewer than this many slots
MIN_CAPACITY = 8
class HashTable:
    """
    A hash table with `capacity` buckets
    that accepts string keys
    Implement this.
    """
    def __init__(self, capacity):
        self.capacity = capacity  # Number of buckets in the hash table
        self.storage = [None] * capacity
        self.item_count = 0
    def get_num_slots(self):
        """
        Return the length of the list you're using to hold the hash
        table data. (Not the number of items stored in the hash table,
        but the number of slots in the main list.)
        One of the tests relies on this.
        Implement this.
        """
        # Your code here
    def get_load_factor(self):
        """
        Return the load factor for this hash table.
        Implement this.
        """
        return self.item_count / self.capacity
    def resize(self, new_capacity):
        """
        Changes the capacity of the hash table and
        rehashes all key/value pairs.
        Implement this.
        """
        old_storage = self.storage
        self.capacity = new_capacity
        self.storage = [None] * self.capacity
        current_entry = None
        # Save this because put adds to it, and we don't want that.
        # It might be less hackish to pass a flag to put indicating that
        # we're in a resize and don't want to modify item count.
        old_item_count = self.item_count
        for bucket_item in old_storage:
            current_entry = bucket_item
            while current_entry is not None:
                self.put(current_entry.key, current_entry.value)
                current_entry = current_entry.next
        # Restore this to the correct number
        self.item_count = old_item_count
    def djb2(self, key):
        """
        DJB2 hash, 32-bit
        Implement this, and/or FNV-1.
        """
        str_key = str(key).encode()
        hash = FNV_offset_basis_64
        for b in str_key:
            hash *= FNV_prime_64
            hash ^= b
            hash &= 0xffffffffffffffff  # 64-bit hash
        return hash
    def hash_index(self, key):
        """
        Take an arbitrary key and return a valid integer index
        within the hash table's storage capacity.
        """
        return self.djb2(key) % self.capacity
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
        current_entry = self.storage[index]
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
    def delete(self, key):
        """
        Remove the value stored with the given key.
        Print a warning if the key is not found.
        Implement this.
        """
        # Your code here
    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        Implement this.
        """
        # Your code hereFollow Along
Let's change our put method to resize when the load factor gets too high. Here's how our current put method looks:
def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next
    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entryTo know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.
def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next
    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry
        self.item_count += 1Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.
def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next
    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry
        self.item_count += 1
        if self.get_load_factor() > 0.7:
            self.resize(self.capacity * 2)Fantastic, we did it!

Objective 02 - Define and compute the load factor of a hash table and implement a hash table that automatically resizes based on load factor
Overview
What does runtime look like with linked list chaining?
The performance of hash tables for search, insertion, and deletion is constant time (O(1)) in the average case. However, as the chains get longer and longer, in the worst case, those same operations are done in linear time (O(n)). The more collisions that your hash table has, the less performant the hash table is. To avoid collisions, a proper hash function and maintaining a low load factor is crucial. What is a load factor?
Load Factor
The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.

Hash tables use an array for storage. So, the load factor is the number of occupied slots divided by the length of the array. So, an array of length 10 with three items in it has a load factor of 0.3, and an array of length 20 with twenty items has a load factor of 1. If you use linear probing for collision resolution, then the maximum load factor is 1. If you use chaining for collision resolution, then the load factor can be greater than 1.
As the load factor of your hash table increases, so does the likelihood of a collision, which reduces your hash table's performance. Therefore, you need to monitor the load factor and resize your hash table when the load factor gets too large. The general rule of thumb is to resize your hash table when your load factor is greater than 0.7. Also, when you resize, it is common to double the size of the hash table. When you resize the array, you need to re-insert all of the items into this new hash table. You cannot simply copy the old items into the new hash table. Each item has to be rerun through the hashing function because the hashing function considers the size of the hash table when determining the index that it returns.
You can see that resizing is an expensive operation, so you don’t want to resize too often. However, when we average it out, hash tables are constant time (O(1)) even with resizing.
The load factor can also be too small. If the hash table is too large for the data that it is storing, then memory is being wasted. So, in addition to resizing, when the load factor gets too high, you should also resize when the load factor gets too low.
One way to know when to resize your hash table is to compute the load factor whenever an item is inserted or deleted into the hash table. If the load factor is too high or too low, then you need to resize.
We added a get_load_factor and resize method to calculate the load factor and resize the hash table with a new capacity when necessary.
class HashTableEntry:
    """
    Linked List hash table key/value pair
    """
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None
# Hash table can't have fewer than this many slots
MIN_CAPACITY = 8
class HashTable:
    """
    A hash table with `capacity` buckets
    that accepts string keys
    Implement this.
    """
    def __init__(self, capacity):
        self.capacity = capacity  # Number of buckets in the hash table
        self.storage = [None] * capacity
        self.item_count = 0
    def get_num_slots(self):
        """
        Return the length of the list you're using to hold the hash
        table data. (Not the number of items stored in the hash table,
        but the number of slots in the main list.)
        One of the tests relies on this.
        Implement this.
        """
        # Your code here
    def get_load_factor(self):
        """
        Return the load factor for this hash table.
        Implement this.
        """
        return self.item_count / self.capacity
    def resize(self, new_capacity):
        """
        Changes the capacity of the hash table and
        rehashes all key/value pairs.
        Implement this.
        """
        old_storage = self.storage
        self.capacity = new_capacity
        self.storage = [None] * self.capacity
        current_entry = None
        # Save this because put adds to it, and we don't want that.
        # It might be less hackish to pass a flag to put indicating that
        # we're in a resize and don't want to modify item count.
        old_item_count = self.item_count
        for bucket_item in old_storage:
            current_entry = bucket_item
            while current_entry is not None:
                self.put(current_entry.key, current_entry.value)
                current_entry = current_entry.next
        # Restore this to the correct number
        self.item_count = old_item_count
    def djb2(self, key):
        """
        DJB2 hash, 32-bit
        Implement this, and/or FNV-1.
        """
        str_key = str(key).encode()
        hash = FNV_offset_basis_64
        for b in str_key:
            hash *= FNV_prime_64
            hash ^= b
            hash &= 0xffffffffffffffff  # 64-bit hash
        return hash
    def hash_index(self, key):
        """
        Take an arbitrary key and return a valid integer index
        within the hash table's storage capacity.
        """
        return self.djb2(key) % self.capacity
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
        current_entry = self.storage[index]
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
    def delete(self, key):
        """
        Remove the value stored with the given key.
        Print a warning if the key is not found.
        Implement this.
        """
        # Your code here
    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        Implement this.
        """
        # Your code hereFollow Along
Let's change our put method to resize when the load factor gets too high. Here's how our current put method looks:
def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next
    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entryTo know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.
def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next
    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry
        self.item_count += 1Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.
def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next
    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry
        self.item_count += 1
        if self.get_load_factor() > 0.7:
            self.resize(self.capacity * 2)Fantastic, we did it!
Challenge
- Do we need to modify our - deleteand- getmethods to account for the new- get_load_factorand- resizemethods? Why or why not?
Additional Resources
Homework:
Given a string text, you need to use the characters of text to form as many instances of the word "lambda" as possible.
You can use each character in text at most once.
Write a function that returns the maximum number of instances of "lambda" that can be formed.
Input: text = "mbxcdatlas"
Output: 1
Example 2:
Input: text = "lalaaxcmbdtsumbdav"
Output: 2
Example 3:
Input: text = "sctlamb"
Output: 0
Notes:
text consists of lowercase English characters only
[execution time limit] 4 seconds (py3)
[input] string text
[output] integer
def csMaxNumberOfLambdas(text):
    sub_string = "lambda"
    lambda_count = {"l": 0, "a": 0, "m": 0, "b": 0, "d": 0, "a": 0}
    counts = []
    for letter in text:
        if letter in lambda_count:
            lambda_count[letter] += 1
    for key, value in lambda_count.items():
        counts.append(value)
    return min(counts)
Last updated
Was this helpful?
