1 of 9

wk18

Website Version

Curriculum

Utilities

practice

Resources

quick-reference

Python-Docs

MISC

Interview Prep

Interview Resources

Installations Setup & Env

python-setup

Aux-Exploration

Subject

Outline-W-18

My Notion Notes:

Overview

In this sprint, you will learn about number bases, character encoding, and how values are stored in memory. We'll also talk all about hash tables, a very powerful data structure for getting more performance and flexibility out of your algorithms. Finally, we'll look at optimizing searching through data, as well as the powerful programming technique of recursion.

Number Bases and Character Encoding

This module will teach about RAM, number bases, fixed-width integers, how arrays are stored in memory, and character encoding. These basics allow you to better understand how other data structures are stored in memory and have an intuitive sense of the time complexity of specific data structures' operations.

Hash Tables I

In this module, you will learn about the properties of hash tables, hash functions, and how to implement a hash table in Python. Hash tables are arguably one of the most important and common data structures you use and encounter to solve algorithmic coding challenges.

Hash Tables II

In this module, you will learn about hash collisions and implement a complete hash table in Python that takes collisions into account.

Searching and Recursion

This module will teach about logarithms, linear search, binary search, and recursion. This material sets the stage for learning about traversal techniques of trees and graphs. Additionally, using a binary search is a common optimization technique that may come up during a technical interview.

D1- Module 01 - Number Bases and Character Encoding

Objective 01 - Understand random access memory (RAM) as it relates to data structures

Overview

Your computer has something called random access memory (RAM). Sometimes, people say "memory" when referring to RAM.

Follow Along

One thing that might come to your mind is that there are different types of memory on your computer. What about storing things like videos, text documents, and applications? Are those in "memory"? There is a distinction between "storage" and "memory". Things like videos and files are stored on a disc, not in RAM. RAM is faster than disc storage, but there isn't as much space available. Disc storage has more space, but it is slower.

Think of RAM like a set of numbered, sequential mailboxes. Just like a set of mailboxes with numbered addresses, RAM is also sequential and has numbered addresses.

Now, just like you can put something in a mailbox, you can also put something in RAM. Things that you put in RAM, we can call variables. Each "box" in RAM has an address.

Each one of the "boxes" (memory addresses) in our set of mailboxes (RAM) holds 8 bits. You can think of each bit like a tiny switch that can either be "on" or "off." "On" is represented by a 1, and "off" is represented by a 0.

Bits are often thought about in groups. A group of 8 bits is called a byte. Each "box" in RAM can hold 1 byte (8 bits).

Now, a computer has more than just disc storage and RAM inside of it. There is also a processor. And, in between the processor and the RAM is something called a memory controller. The memory controller can access each box in RAM directly. It is as if the memory controller had tubes connected to each box of the set of mailboxes. Through those tubes, the memory controller can send and receive information directly to each box in RAM.

Why is the direct connection between the memory controller and each box in RAM meaningful? It's so that the memory controller can jump around to which box it wants to communicate with quickly. Even though the boxes are in sequential order, the memory controller doesn't have to go through the boxes in order. It can access the first one, then jump to one somewhere in the middle, and then access one at the end. Because there is a direct connection, this is done quickly.

Whenever you use a computer, you are very concerned with the speed of the computer you are using. So, computer designers made a way to optimize for speed when accessing items in RAM. Whenever a processor accesses a box in RAM, it also accesses and stores the boxes near it. Often, if you are accessing one thing in RAM, it's likely that the next thing you need to access is nearby. That's why keeping a copy of nearby items in the cache speeds things up.

Whenever the processor reads something (say, the player's position in an old adventure game) out of RAM, it adds it to the cache to use it again in the future. Then, when it needs something else from RAM, it will go to the cache for it. As you can see, the cache helps the processor by saving execution cycles required to go out and read something from RAM.

The processor, not RAM, has the actual cache. The memory controller keeps track of what goes into and comes out of the cache.

We can think of it in several ways. Perhaps, the processor can use the cache as a temporary area to keep a copy of its last actions just in case it needs to reread them.

There is one caveat — it is not as if "everything" goes out to RAM and then gets inserted into the cache. In reality, the cache holds only a handful of memory addresses from RAM. Also, note that these few memory addresses in the cache can be accessed faster than other storage locations.

Challenge

Draw a model of how a processor interacts with the cache, memory controller, and RAM whenever it needs to read or write from memory.

Additional Resources

Objective 02 - Convert back and forth from decimal to binary

Overview

Computers use the binary number system, so we will represent all of our variables in the binary number system.

Instead of 10 digits like 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0, the binary number system only has two possible digits, 1 and 0. Another way to think of it is that computers only have switches (bits) that can be in an "off state" or an "on state."

Follow Along

Before we try to understand the binary number system, let's review how the decimal number system works. Let's look at the number "1001" in decimal.

Even though there are two "1" digits in this number, they don't represent the same quantity. The leftmost "1" represents one thousand, and the rightmost "1" represents one unit. The "0"s in-between represent the tens place and the hundreds place.

So this "1001" in base ten represents "1 thousand, 0 hundreds, 0 tens, and 1 one."

Each successive digit in the base 10 number system is a power of ten. The ones place is 10^0 = 1. The tens place is 10^1 = 10. The hundreds place is 10^2 = 100. This pattern continues on and on.

This pattern holds for other number systems as well. In the binary system, each successive digit represents a different power of 2. The first digit represents 2^0 = 1. The second digit represents 2^1 = 2. The third digit represents 2^2 = 4. Again, this pattern continues on and on.

So, what if the number "1001" was in binary and not decimal? What would it represent then? Well, if we read it right to left, we have a "1" in the ones place, a "0" in the twos place, a "0" in the fours place, and a "1" in the eights place. We add these values up (8 + 0 + 0 + 1) which equals 9.

Below, is a table that shows how to count up to 8 in binary and decimal:

Decimal

Binary

0000

0001

0010

0011

0100

0101

0110

0111

1000

Challenge

Convert the following decimal numbers into binary numbers:

Additional Resources

Objective 03 - Understand how fixed-width integers are stored in memory

Overview

We now know that things are stored in RAM using binary, and each "box" in RAM holds 1 byte (8 bits). What does that mean for what we can store in RAM? Let's say we have 1 byte of RAM to use. How many different numbers can we represent using only this 1 byte?

Remember that each digit in a binary number is a successive power of 2. If we have 8 bits to use, we can store 2^8 = 256 different numbers in 1 byte.

Follow Along

Let's see if we can find a pattern:

With one bit, we can express two numbers (0 and 1)
With two bits, for each of the first numbers (0 or 1), we can put a 0 or a 1 after it, so we can express four numbers
With three bits, we can express eight numbers.

Every time we add a new bit, we double the number of possible numbers we can express in binary. This pattern can be generalized as 2^n and 2^8 = 256.

Often, computers use 4 bytes (32 bits) to represent our variables, meaning that we can express as many as 4 billion (2^32) possible values. Similarly, computers may use 8 bytes (64 bits) to represent our variables and can express over 10 billion (2^64).

The 2^X in the binary number system is called the bitsize. Eight bytes of memory are called "8-bit", and 16 bytes are called "16-bit," etc.

In theory, you could use less space to represent smaller integers. For instance, in binary, the number one is represented by 1. So, technically, to store one in binary, you only need one bit. But computers don't usually do this. Many integers take a fixed amount of space, no matter what number they might have in them. So, even though you only need one bit to represent the number one, the computer would still use 32 or 64 bits to do so.

So, if a variable represents a fixed-width integer, it doesn't matter if it has the value 0 or 123,456; the amount of space it takes up in RAM is the same.

The computer can store numbers like 3, 60000000, or -14 in 32 bits, one of the "fixed-width integers" we discussed earlier. All of these fixed-width integers take up constant space (O(1)).

Storing numbers as fixed-width integers introduces a trade-off. We have constant space complexity, and because each integer has a constant and expected number of bits, simple mathematical operations only take constant time. The cost of having an integer as fixed-width is that there is a limit to the number of integers you can represent.

Challenge

What is the number of possible integer values you can store with 4 bytes? How did you make that calculation?
What is the number of possible integer values you can store with 8 bytes? How did you make that calculation?

Additional Resources

https://vladris.com/blog/2018/10/13/arithmetic-overflow-and-underflow.html (Links to an external site.)

Objective 04 - Describe, in general terms, how arrays are stored in memory and the time complexity of lookups

Overview

When writing programs, you likely need to store several numbers, not just one integer.

Follow Along

So, let's say we wanted to write a program that allowed us to keep track of the number of hours we spent studying that day. We will round the number of hours to the nearest whole number to store them as fixed-width integers. Additionally, each day's hours will be represented by eight bits in binary.

So, we will start at memory address 0 in RAM, and each day, store the number of hours we studied in that "box" of RAM. For our first day that we are tracking, we store an 8-bit binary integer in "box" number 0. On the second day, we store an 8-bit binary integer in "box" number 1. This pattern continues.

Now, I'm sure you've already used an array when you are programming. An array is just an ordered sequential collection of data. Well, RAM is already structured like this. Right? Our days where we track the number of hours that we are studying are in sequential order in RAM.

Knowing this information, what can we do if we want to look up how many hours we studied on day 5 (index 4 because of zero-indexing)? Because all of the information is stored in sequential order, we can do simple math. If you are looking for the day 5 information (index 4), you need to know what the starting item address is 0 and then add 4 (the index). Or, if the starting address was 5 and you were looking for the 10th index, you'd go to memory address 15 (5 + 10).

This math works because we are using one "box" in memory for each day's record. If we were using 64-bit integers that take up 8 "boxes" in RAM, we would have to slightly adjust our math. In this case, we would have to multiply the index we were looking for by the number of bytes each record was stored in. So, if we were storing 64-bit integers (8 bytes) and wanted to find the item with index 4, and the starting index was 0, we would go to memory address 0 + (4 * 8) = 32.

Because accessing information from a specific index involves this simple mathematical computation, accessing items in an array is a constant time operation. For the mathematical computations to be consistent and straightforward, arrays have to follow specific rules. Each item in the array has to take up the same number of bytes in RAM. Also, each item has to be stored right next to the previous item in RAM. If there are any gaps or interruptions in the array, then the simple mathematical computation for accessing a particular item no longer works.

Challenge

Let's say you need to store an array of 64-bit integers. Your array needs to have enough capacity for 24 integers. How many 1-byte slots of memory need to be allocated to store this array?

Additional Resources

https://en.wikipedia.org/wiki/Array_data_type

Objective 05 - Describe character encoding and how strings are stored in memory

Overview

In this example, we will store some strings. A string, as we know, is just a bunch of characters or letters. One straightforward way to store a string is an array, so let's see how we can define some mappings to make it easier to store strings in arrays.

Follow Along

To use our 8-bit slots in memory, we need a way to encode each character in a string in 8-bits. One common character encoding to do this is called "ASCII". Here's how the alphabet is encoded in ASCII:

Letter

Encoding

01000001

01000010

01000011

01000100

01000101

01000110

01000111

01001000

01001001

01001010

01001011

01001100

01001101

01001110

01001111

01010000

01010001

01010010

01010011

01010100

01010101

01010110

01010111

01011000

01011001

01011010

Since we can express characters as 8-bit integers, we can express strings as arrays of 8-bit characters.

For example, we could represent LAMBDA like so:

L -> 01001100
A -> 01000001
M -> 01001101
B -> 01000010
D -> 01000100
A -> 01000001

Each character, once it was encoded, could be stored as one 8-bit slot in memory.

Challenge

Draw out a model of a section of memory that stores the string "Computer Science" as an array of 8-bit ASCII characters.

Additional Resources

https://www.w3schools.com/charsets/ref_html_ascii.asp

D2- Module 02 - Hash Tables I

Objective 01 - Recall the time and space complexity, the strengths and weaknesses, and the common uses of a hash table

Overview

Hash tables are also called hash maps, maps, unordered maps, or dictionaries. A hash table is a structure that maps keys to values. This makes them extremely efficient for lookups because if you have the key, retrieving the associated value is a constant-time operation.

Follow Along

Time and Space Complexity

Lookup

Hash tables have fast lookups (O(1)) on average. However, in the worst case, they have slow (O(n)) lookups. The slow lookups happen when there is a hash collision (two different keys hash to the same index).

Insert

Hash tables have fast insertions (O(1)) on average. However, in the worst case, they have slow (O(n)) insertions. Just like with the lookups, the worst case occurs due to hash collisions.

Delete

Hash tables have fast deletes (O(1)) on average. However, in the worst case, they have slow (O(n)) deletions. Just like with lookups and insertions, the worst case occurs due to hash collisions.

Space

The space complexity of a hash table is linear (O(n)). Each key-value pair in the hash table will take up space in memory.

Strengths

The main reason why hash tables are great is that they have constant-time (O(1)) lookup operations in the average case. That makes them great to use in any situation where you will be conducting many lookup operations. The second reason they are great is that they allow you to use any hashable object as a key. This means they can be used in many different scenarios where you want to map one object (the key) to another object (the value).

Weaknesses

One weakness of the hash table is that the mapping goes only one way. So, if you know the key, it's incredibly efficient to retrieve the mapped value to that key. However, if you know the value and want to find the key that is mapped to that value, it is inefficient. Another weakness is that if your hash function produces lots of collisions, the hash table's time complexity gets worse and worse. This is because the underlying linked lists are inefficient for lookups.

What About Hash Collisions?

A hash collision is when our hash function returns the same index given two different keys. Theoretically, there is no perfect hash function, though some are better than others. Thus, any hash table implementation has to have a strategy to deal with the scenario when two different keys hash to the same index. You can't just have the hash table overwrite the already existing value.

The most common strategy for dealing with hash collisions is not storing the values directly at an index of the hash table's array. Instead, the array index stores a pointer to a linked list. Each node in the linked list stores a key, value, and a pointer to the next item in the linked list.

The above is just one of the ways to deal with hash collisions. Hopefully, you can now see why all of our hash table operations become O(n) in the worst case. What is the worst case? The worst case is when all of the keys collide at the same index in our hash table. If we have ten items in our hash table, all ten items are stored in one linked list at the same index of our array. That means that our hash table has the same efficiency as a linked list in the worst case.

Challenge

In your own words, explain how and why the time complexity of hash table operations degrades to O(n) in the worst case.

Additional Resources

https://www.geeksforgeeks.org/hashing-data-structure/ (Links to an external site.)

Objective 02 - Describe and implement a hash function

Overview

Hashing functions take an input (usually a string) and return an integer as the output. Let's say we needed to store five colors in our hash table. Currently, we have an empty table that looks like this:

Now, I need to assign an index given the name of a color. Our hash function will take the name of a color and convert it into an index.

So, hash functions convert the strings into indexes, and then we store the given string into the computed index of an array.

Hash Function + Array = Hash Table

Now that we know what a hash table is let's dive deeper into creating a hash function.

Follow Along

To convert a string into an integer, hashing functions operate on the individual characters that make up the string.

Let's use what we know to create a hashing function in Python.

In Python, we can encode strings into their bytes representation with the .encode() method (read more here (Links to an external site.)). Once encoded, an integer represents each character.

Let's do this with the string hello

bytes_representation = "hello".encode()

for byte in bytes_representation:
    print(byte)

### Print Output
### 104
### 101
### 108
### 108
### 111

Now that we’ve converted our string into a series of integers, we can manipulate those integers somehow. For simplicity’s sake, we can use a simple accumulator pattern to get a sum of all the integer values.

bytes_representation = "hello".encode()

sum = 0
for byte in bytes_representation:
    sum += byte

print(sum)

### Print Output
### 532

Great! We turned a string into a number. Now, let's generalize this into a function.

def my_hashing_func(str):
    bytes_representation = str.encode()

    sum = 0
    for byte in bytes_representation:
        sum += byte

    return sum

We aren't done yet 🤪. As shown earlier, hello returns 532. But, what if our hash table only has ten slots? We have to make 532 a number less than 10 😱.

Remember the modulo operator %? We can use that in our hashing function to ensure that the integer the function returns is within a specific range.

def my_hashing_func(str, table_size):
    bytes_representation = str.encode()

    sum = 0
    for byte in bytes_representation:
        sum += byte

    return sum % table_size

https://replit.com/@bgoonz/cs-unit-1-sprint-4-module-1-hash-function#main.py

Objective 03 - Implement a user-defined HashTable class that allows basic operations

Overview

We define a hash table as an empty array and hash function as a function that takes a value and converts it into an array index where you will store that value. Let's put the two together. Let's implement a HashTable class where we can:

Insert values into a hash table
Retrieve values from a hash table
Delete values from a hash table

Let's start with the insert function. For an insert, I need to insert a value with an associated key. Let's store the instructors at Lambda and where they live. We want to store:

("Parth", "California")
("Beej", "Oregon")
("Dustin", "Utah")
("Ryan", "Utah")

Here's what our HashTable class looks like right now:

class HashTable:
    """
    A hash table with `capacity` buckets
    that accepts string keys
    """

    def __init__(self, capacity):
        self.capacity = capacity  # Number of buckets in the hash table
        self.storage = [None] * capacity
        self.item_count = 0

    def get_num_slots(self):
        """
        Return the length of the list you're using to hold the hash table data. (Not the number of items stored in the hash table,
        but the number of slots in the main list.)
        One of the tests relies on this.
        """
        return len(self.storage)

    def djb2(self, key):
        """
        DJB2 hash, 32-bit
        """
        # Cast the key to a string and get bytes
        str_key = str(key).encode()

        # Start from an arbitrary large prime
        hash_value = 5381

        # Bit-shift and sum value for each character
        for b in str_key:
            hash_value = ((hash_value << 5) + hash_value) + b
            hash_value &= 0xffffffff  # DJB2 is a 32-bit hash, only keep 32 bits

        return hash_value

    def hash_index(self, key):
        """
        Take an arbitrary key and return a valid integer index within the hash table's storage capacity.
        """
        return self.djb2(key) % self.capacity

    def put(self, key, value):
        """
        Store the value with the given key.
        """

    def delete(self, key):
        """
        Remove the value stored with the given key.
        Print a warning if the key is not found.
        """

    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        """

Let's break this down a little bit. Our init function takes in the length of a hash table and creates an empty array. Our hash_index function takes a key and computes and index using the famous djb2 hash function. Let's implement the other functions (put, delete, get).

Follow Along

The `put` Method

Let's create our put function. Before we code, let's break down what needs to happen:

Given a key and a value, insert the respective value into a hash table array using the hashed key to determine the storage location index.

Let's think about what we need to do:

Hash the key into an index using the hash function
Put the value into that index

You might be thinking, "What if two keys hash to the same index?" That's a great question, and we will worry about that later. It's a nifty solution 🎉. But for now, let's worry about hashing a key and storing a value.

First, let's call the hash function and store the return value in index:

def put(self, key, value):
    """
    Store the value with the given key.
    """
    index = self.hash_index(key)

Next, let's insert the value at that index:

def put(self, key, value):
    """
    Store the value with the given key.
    """
    index = self.hash_index(key)
    self.storage[index] = value
    return

There we go! Given a key, we hashed it and inserted a value. Again, we will worry about colliding indices later 😀.

The `delete` Method

Next, let's write our delete method. What does this method need to do? We can think about it as the inverse of the put method that we just defined. The function will receive a key as its input, then pass that key through the hash function to get the index where the hash table's value needs to be deleted.

Let's start by getting the index by passing the key through the hashing function:

def delete(self, key):
    """
    Remove the value stored with the given key.
    """
    index = self.hash_index(key)

Next, we need to delete the value from that index in our storage by setting it to None. Remember, we aren't dealing with collisions in this example. If we had to deal with collisions, this would be more complex.

def delete(self, key):
    """
    Remove the value stored with the given key.
    """
    index = self.hash_index(key)
    self.storage[index] = None

The `get` Method

The last method we need to deal with is our get method. get is a simple method that retrieves the value stored at a specific key. The function needs to receive a key as an input, pass that key through the hashing function to find the index where the value is stored, and then return the value at that index.

Let's start by getting the index from the key:

def get(self, key):
    """
    Retrieve the value stored with the given key.
    Returns None if the key is not found.
    """
    index = self.hash_index(key)

Next, we need to return the value that is stored at the index.

def get(self, key):
    """
    Retrieve the value stored with the given key.
    Returns None if the key is not found.
    """
    index = self.hash_index(key)
    return self.storage[index]

Hash Table / Hash Map In Python:

Hash Table and Hashmap in Python

Data requires a number of ways in which it can be stored and accessed. One of the most important implementations includes Hash Tables. In Python, these Hash tables are implemented through the built-in data type i.e, dictionary. In this article, you will learn what are Hash Tables and Hashmaps in Python and how you can implement them using dictionaries.

Before moving ahead, let us take a look at all the topics of discussion:

What is a Hash table or a Hashmap in Python?
Hash table vs Hashmap
Creating Dictionaries
Creating Nested Dictionaries
Performing Operations on Hash Tables using dictionaries
Accessing Values
Updating Values
Deleting Items
Converting a Dictionary into a Dataframe

In computer science, a Hash table or a Hashmap is a type of data structure that maps keys to its value pairs (implement abstract array data types). It basically makes use of a function that computes an index value that in turn holds the elements to be searched, inserted, removed, etc. This makes it easy and fast to access data. In general, hash tables store key-value pairs and the key is generated using a hash function.

Hash tables or has maps in Python are implemented through the built-in dictionary data type. The keys of a dictionary in Python are generated by a hashing function. The elements of a dictionary are not ordered and they can be changed.

An example of a dictionary can be a mapping of employee names and their employee IDs or the names of students along with their student IDs.

Moving ahead, let’s see the difference between the hash table and hashmap in Python.

Dictionaries can be created in two ways:

Using curly braces ({})
Using the dict() function

Using curly braces:

Dictionaries in Python can be created using curly braces as follows:

EXAMPLE:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
print(my_dict)
type(my_dict)

OUTPUT:

{‘Dave’: ‘001’, ‘Ava’: ‘002’, ‘Joe’: ‘003’} dict

Using dict() function:

Python has a built-in function, dict() that can be used to create dictionaries in Python. For example:

EXAMPLE:

new_dict=dict()
print(new_dict)
type(new_dict)

OUTPUT:

{} dict

In the above example, an empty dictionary is created since no key-value pairs are supplied as a parameter to the dict() function. In case you want to add values, you can do as follows:

EXAMPLE:

new_dict=dict(Dave = '001' , Ava= '002' , Joe= '003')
print(new_dict)
type(new_dict)

OUTPUT:

{‘Dave’: ‘001’, ‘Ava’: ‘002’, ‘Joe’: ‘003’} dict

Nested dictionaries are basically dictionaries that lie within other dictionaries. For example:

EXAMPLE:

emp_details = {'Employee': {'Dave': {'ID': '001',
 'Salary': 2000,
 'Designation':'Python Developer'},
 'Ava': {'ID':'002',
 'Salary': 2300,
 'Designation': 'Java Developer'},
 'Joe': {'ID': '003',
 'Salary': 1843,
 'Designation': 'Hadoop Developer'}}}

There are a number of operations that can be performed on has tables in Python through dictionaries such as:

Accessing Values
Updating Values
Deleting Element

Accessing Values:

The values of a dictionary can be accessed in many ways such as:

Using key values
Using functions
Implementing the for loop

Using key values:

Dictionary values can be accessed using the key values as follows:

EXAMPLE:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
my_dict\['Dave'\]

OUTPUT: ‘ 001′

Using functions:

There are a number of built-in functions that can be used such as get(), keys(), values(), etc.

EXAMPLE:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
print(my_dict.keys())
print(my_dict.values())
print(my_dict.get('Dave'))

OUTPUT:

dict_keys([‘Dave’, ‘Ava’, ‘Joe’]) dict_values([‘001’, ‘002’, ‘003’]) 001

Implementing the for loop:

The for loop allows you to access the key-value pairs of a dictionary easily by iterating over them. For example:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
print("All keys")
for x in my_dict:
 print(x) #prints the keys
print("All values")
for x in my_dict.values():
 print(x) #prints values
print("All keys and values")
for x,y in my_dict.items():
 print(x, ":" , y) #prints keys and values

OUTPUT:

All keys Dave Ava Joe All values 001 002 003 All keys and values Dave : 001 Ava : 002 Joe : 003

Dictionaries are mutable data types and therefore, you can update them as and when required. For example, if I want to change the ID of the employee named Dave from ‘001’ to ‘004’ and if I want to add another key-value pair to my dictionary, I can do as follows:

EXAMPLE:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
my_dict\['Dave'\] = '004' #Updating the value of Dave
my_dict\['Chris'\] = '005' #adding a key-value pair
print(my_dict)

OUTPUT: {‘Dave’: ‘004’, ‘Ava’: ‘002’, ‘Joe’: ‘003’, ‘Chris’: ‘005’}

There a number of functions that allow you to delete items from a dictionary such as del(), pop(), popitem(), clear(), etc. For example:

EXAMPLE:

my_dict={'Dave': '004', 'Ava': '002', 'Joe': '003', 'Chris': '005'}
del my_dict\['Dave'\] #removes key-value pair of 'Dave'
my_dict.pop('Ava') #removes the value of 'Ava'
my_dict.popitem() #removes the last inserted item
print(my_dict)

OUTPUT: {‘Joe’: ‘003’}

The above output shows that all the elements except ‘Joe: 003’ have been removed from the dictionary using the various functions.

As you have seen previously, I have created a nested dictionary containing employee names and their details mapped to it. Now to make a clear table out of that, I will make use of the pandas library in order to put everything as a dataframe.

EXAMPLE:

import pandas as pd
emp_details = {'Employee': {'Dave': {'ID': '001',
 'Salary': 2000,
 'Designation':'Python Developer'},
 'Ava': {'ID':'002',
 'Salary': 2300,
 'Designation': 'Java Developer'},
 'Joe': {'ID': '003',
 'Salary': 1843,
 'Designation': 'Hadoop Developer'}}}
df=pd.DataFrame(emp_details\['Employee'\])
print(df)

OUTPUT:

Hash Table Use Cases

Associative arrays

Main article: Associative array

Hash tables are commonly used to implement many types of in-memory tables. They are used to implement associative arrays (arrays whose indices are arbitrary strings or other complicated objects), especially in interpreted programming languages like Ruby, Python, and PHP.

When storing a new item into a multimap and a hash collision occurs, the multimap unconditionally stores both items.

When storing a new item into a typical associative array and a hash collision occurs, but the actual keys themselves are different, the associative array likewise stores both items. However, if the key of the new item exactly matches the key of an old item, the associative array typically erases the old item and overwrites it with the new item, so every item in the table has a unique key.

Database indexing

Hash tables may also be used as disk-based data structures and database indices (such as in dbm) although B-trees are more popular in these applications. In multi-node database systems, hash tables are commonly used to distribute rows amongst nodes, reducing network traffic for hash joins.

Caches[edit]

Main article: Cache (computing)

Hash tables can be used to implement caches, auxiliary data tables that are used to speed up the access to data that is primarily stored in slower media. In this application, hash collisions can be handled by discarding one of the two colliding entries—usually erasing the old item that is currently stored in the table and overwriting it with the new item, so every item in the table has a unique hash value.

Sets[edit]

Besides recovering the entry that has a given key, many hash table implementations can also tell whether such an entry exists or not.

Those structures can therefore be used to implement a set data structure,[43] which merely records whether a given key belongs to a specified set of keys. In this case, the structure can be simplified by eliminating all parts that have to do with the entry values. Hashing can be used to implement both static and dynamic sets.

Object representation

Several dynamic languages, such as Perl, Python, JavaScript, Lua, and Ruby, use hash tables to implement objects. In this representation, the keys are the names of the members and methods of the object, and the values are pointers to the corresponding member or method.

Unique data representation

Main article: String interning

Hash tables can be used by some programs to avoid creating multiple character strings with the same contents. For that purpose, all strings in use by the program are stored in a single string pool implemented as a hash table, which is checked whenever a new string has to be created. This technique was introduced in Lisp interpreters under the name hash consing, and can be used with many other kinds of data (expression trees in a symbolic algebra system, records in a database, files in a file system, binary decision diagrams, etc.).

Transposition table

Main article: Transposition table

A transposition table to a complex Hash Table which stores information about each section that has been searched.[44]

Practice

def djb2(key):
  """
  DJB2 hash, 32-bit
  """
  # Cast the key to a string and get bytes
  str_key = str(key).encode()
  # Start from an arbitrary large prime
  hash_value = 5381
  # Bit-shift and sum value for each character
  for b in str_key:
      hash_value = ((hash_value << 5) + hash_value) + b
      hash_value &= 0xffffffff  # DJB2 is a 32-bit hash, only keep 32 bits
  return hash


def fnv1(key):
  """
  FNV-1 hash, 64-bit
  """
  # Cast the key to a string and get bytes
  str_key = str(key).encode()
  hash = 0x00000100000001B3  # FNV Prime
  for b in str_key:
      hash *= 0xcbf29ce484222325  # FNV Offset Basis
      hash ^= b
      hash &= 0xffffffffffffffff  # 64-bit hash
  return hash


"""# Load Factor
- if load factor is greater than 70% then double capacity of storage
- if load factor is less than 20% half capacity  storage
"""
"""
1. Write the delete method with the assumption that linked list chaining was used for collision resolution.
2. Write the get method with the assumption that linked list chaining was used for collision resolution.
"""


class HashTableEntry:
  """
  Linked List hash table key/value pair
  """

  def __init__(self, key, value):
      self.key = key
      self.value = value
      self.next = None
# Hash table can't have fewer than this many slots
# MIN_CAPACITY = 8


class HashTable:
  """
  A hash table that with `capacity` buckets
  that accepts string keys
  Implement this.
  """

  def __init__(self, capacity):
    self.capacity = capacity  # Number of buckets in the hash table
    self.storage = [None] * capacity
    self.item_count = 0
    self.MIN_CAPACITY = 8

  def get_num_slots(self):
    """
    Return the length of the list you're using to hold the hash
    table data. (Not the number of items stored in the hash table,
    but the number of slots in the main list.)
    """
    return len(self.storage)

  def get_load_factor(self):
    """
    Return the load factor for this hash table.
    Implement this.
    """
    return self.item_count / self.capacity

  def djb2(self, key):
    """
    DJB2 hash, 32-bit
    Implement this, and/or FNV-1.
    """
    # Cast the key to a string and get bytes
    str_key = str(key).encode()
    # Start from an arbitrary large prime
    hash_value = 5381
    # Bit-shift and sum value for each character
    for b in str_key:
        hash_value = ((hash_value << 5) + hash_value) + b
        hash_value &= 0xffffffff  # DJB2 is a 32-bit hash, only keep 32 bits
    return hash_value

  def hash_index(self, key):
    """
    Take an arbitrary key and return a valid integer index
    between within the storage capacity of the hash table.
    """
    return self.djb2(key) % self.capacity

  def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next
    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry
    # increment the item count
    self.item_count += 1
    # resize based on load factor reaching higher than 70% (using a doubling strategy)
    if self.get_load_factor() > 0.7:
      self.resize(self.capacity * 2)

  def delete(self, key):
    """
    Remove the value stored with the given key.
    Print a warning if the key is not found.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    last_entry = None
    while current_entry is not None and current_entry.key != key:
        last_entry = current_entry
        current_entry = last_entry.next
    if current_entry is None:
        print("ERROR: Unable to remove the entry with a key of", key)
    else:
        if last_entry is None:
            self.storage[index] = current_entry.next
        else:
            last_entry.next = current_entry.next
        # decrement the item count
        self.item_count -= 1
    # TODO:  resizing?
    if self.get_load_factor() < 0.2:
      if self.capacity > self.MIN_CAPACITY:
        new_capacity = self.capacity // 2
        if new_capacity < self.MIN_CAPACITY:
          new_capacity = self.MIN_CAPACITY
        self.resize(new_capacity)

  def get(self, key):
    """
    Retrieve the value stored with the given key.
    Returns None if the key is not found.
    Implement this.
    """
    index = self.hash_index(key)
    current_entry = self.storage[index]
    # while the current entry exists
    while current_entry is not None:
        # check if the current entry key is the same as the passed in key
        if current_entry.key == key:
            # return the current entry value
            return current_entry.value
        # traverse to the next entry
        current_entry = current_entry.next
    return None

  def resize(self, new_capacity):  # O(n * k)
    """
    Changes the capacity of the hash table and rehashes all of the key / value pairs
    """
    # hold a ref to the old storage
    old_storage = self.storage
    # set the new capacity
    self.capacity = new_capacity
    # create the new storage
    self.storage = [None] * self.capacity
    # create a placeholder for the current entrys reference
    current = None
    # keep a copy of the original item count
    old_count = self.item_count
    # iterate over each bucket in the old storage
    for bucket_item in old_storage:
      # get the current entry
      current = bucket_item
      # while the current entry exists
      while current:
        # put the current entrys key value pair in to the new storage
        self.put(current.key, current.value)
        # traverse to the next entry
        current = current.next
    # restore the item count
    self.item_count = old_count


if __name__ == "__main__":
    ht = HashTable(8)
    ht.put("line_1", "'Twas brillig, and the slithy toves")
    ht.put("line_2", "Did gyre and gimble in the wabe:")
    ht.put("line_3", "All mimsy were the borogoves,")
    ht.put("line_4", "And the mome raths outgrabe.")
    ht.put("line_5", '"Beware the Jabberwock, my son!')
    ht.put("line_6", "The jaws that bite, the claws that catch!")
    ht.put("line_7", "Beware the Jubjub bird, and shun")
    ht.put("line_8", 'The frumious Bandersnatch!"')
    ht.put("line_9", "He took his vorpal sword in hand;")
    ht.put("line_10", "Long time the manxome foe he sought--")
    ht.put("line_11", "So rested he by the Tumtum tree")
    ht.put("line_12", "And stood awhile in thought.")
    print("")
    # Test storing beyond capacity
    for i in range(1, 13):
        print(ht.get(f"line_{i}"))
    # Test resizing
    old_capacity = ht.get_num_slots()
    ht.resize(ht.capacity * 2)
    new_capacity = ht.get_num_slots()
    print(f"\nResized from {old_capacity} to {new_capacity}.\n")
    # Test if data intact after resizing
    for i in range(1, 13):
        print(ht.get(f"line_{i}"))
    print("")
"""# Demo"""
"""
You are given a non-empty list of words.
Write a function that returns the *k* most frequent elements.
The list that you return should be sorted by frequency from highest to lowest.
If two words have the same frequency, then the word with the lower alphabetical
order should come first.
Example 1:
```plaintext
Input:
words = ["lambda", "school", "rules", "lambda", "school", "rocks"]
k = 2
Output:
["lambda", "school"]
Explanation:
"lambda" and "school" are the two most frequent words.
```
Example 2:
```plaintext
Input:
words = ["the", "sky", "is", "cloudy", "the", "the", "the", "cloudy", "is", "is"]
k = 4
Output:
["the", "is", "cloudy", "sky"]
Explanation:
"the", "is", "cloudy", and "sky" are the four most frequent words. The words
are sorted from highest frequency to lowest.
```
Notes:
- `k` is always valid: `1 <= k <= number of unique elements.
- words in the input list only contain lowercase letters.
```
["the", "sky", "is", "cloudy", "the", "the", "the", "cloudy", "is", "is"]
freq = {"the": 4, "sky": 1, "is": 3, "cloudy": 2} sort this based on the values
res = ["the", "is", "cloudy", "sky"]
res[:k]
# get the freq of each word
# sort the dictionary besed on keys (maybe use a lambda)
# encapsulate this sorted dictionary inside another function to sort it alpha?
# then slice the results to constrain to k
"""


def top_k_frequent(words, k):
  """
  Input:
  words -> List[str]
  k -> int
  Output:
  List[str]
  """
  frequency = {}
  for word in words:
    if word in frequency:
      frequency[word] += 1
    else:
      frequency[word] = 1
  print(frequency)


# Tests
top_k_frequent(["the", "sky", "is", "cloudy", "the",
                "the", "the", "cloudy", "is", "is"], 4)
# top_k_frequent(["lambda", "school", "rules", "lambda", "school", "rocks"], 2)

K-most-Frequent:

def top_k_frequent(words, k):
  """
  Input:
  words -> List[str]
  k -> int
  Output:
  List[str]
  """
  frequency = {}

  for word in words:
    if word in frequency:
      frequency[word] += 1
    else:
      frequency[word] = 1

  sorted_data = sorted(frequency, key=lambda word: (-frequency[word], word))
  
  return sorted_data[:k]

def helper(word):


  return word

# Tests
print(top_k_frequent(["the", "sky", "is", "cloudy", "the", "the", "the", "cloudy", "is", "is"], 4))
print(top_k_frequent(["lambda", "school", "rules", "lambda", "school", "rocks"], 2))


# # Output
# 
# ['the', 'is', 'cloudy', 'sky']
# ['lambda', 'school']

D3-Module 03 - Hash Tables II

Number Bases & Characters

Original:

Objective 01 - Understand hash collisions and use a linked list for collision resolution in a user-defined Hashable class

Overview

Remember when we wondered what would happen if multiple keys hashed to the same index, and we said that we would worry about it later? Whelp, it's later 🤪.

Let's say we were given the key-value pair ("Ryan", 10). Our hash code then maps "Ryan" to index 3. Excellent, that works!Now let's say after we inserted ("Ryan", 10), we have to insert ("Parth", 12). Our hash code maps "Parth" to index 3. Uh oh! Ryan is already there! What do we do?? 😱

Ok, let's stop freaking out, and let's think about this. If we don't do anything, the value stored at index 3 will just get overwritten. Meaning if we try to retrieve the value associated with "Ryan", 12 will be returned instead of 10. That might not seem like a big deal, but what if we were returning passwords based on a user ID, and we returned someone else's password. That would be horrible.

Let's fix this problem. The most common way to solve this is with chaining. If we see multiple values hashed to an index, we will chain them in a some data structure that can hold multiple items. In our case, we'll use Python's list type, but a more typical solution would use a linked list. We'll cover linked lists in a future module.

Ok, sounds ideal? But how does this work in code? Let's write some of it together.

Follow Along

Below is a partially filled out hash table class where we will be using HashTableEntry as our chain entries.

Take a look at the code below.

Let's implement the put method with collision resolution by chaining. What are the two cases we need to handle?

There are no entries at the index. Great! We can initialize the entry to a list with the new HashTableEntry in it.
There are multiple entries at the index. We need to check every entry in the chain. If the key in one of the entries is equal to the key we are passing in, we need to replace it. For instance, let's say we pass in ("Ryan", 12), and then we later pass in ("Ryan", 15). We would need to replace "Ryan"'s old value with 15. If there are no entries that match, we create a new entry at the end of the chain.

Ok, that might sound confusing. Let's start breaking it down into code.

First, we need to hash the key and start with the first entry at that index.

Next, we need to go through the chain. We need to check two conditions:

The current entry is not empty.
The key or the current entry is not equal to the key we are passing in.

Sweet! Now we need to check what happens when the loop breaks. It would only break for two reasons:

We reached an entry with the same key and need to replace the value.
We reached the end of the chain and need to create a new entry.

Let's write that in code!

Great! We created the put method.

Challenge

Objective 02 - Define and compute the load factor of a hash table and implement a hash table that automatically resizes based on load factor

Overview

What does runtime look like with linked list chaining?

The performance of hash tables for search, insertion, and deletion is constant time (O(1)) in the average case. However, as the chains get longer and longer, in the worst case, those same operations are done in linear time (O(n)). The more collisions that your hash table has, the less performant the hash table is. To avoid collisions, a proper hash function and maintaining a low load factor is crucial. What is a load factor?

Load Factor

The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.

Hash tables use an array for storage. So, the load factor is the number of occupied slots divided by the length of the array. So, an array of length 10 with three items in it has a load factor of 0.3, and an array of length 20 with twenty items has a load factor of 1. If you use linear probing for collision resolution, then the maximum load factor is 1. If you use chaining for collision resolution, then the load factor can be greater than 1.

As the load factor of your hash table increases, so does the likelihood of a collision, which reduces your hash table's performance. Therefore, you need to monitor the load factor and resize your hash table when the load factor gets too large. The general rule of thumb is to resize your hash table when your load factor is greater than 0.7. Also, when you resize, it is common to double the size of the hash table. When you resize the array, you need to re-insert all of the items into this new hash table. You cannot simply copy the old items into the new hash table. Each item has to be rerun through the hashing function because the hashing function considers the size of the hash table when determining the index that it returns.

You can see that resizing is an expensive operation, so you don’t want to resize too often. However, when we average it out, hash tables are constant time (O(1)) even with resizing.

The load factor can also be too small. If the hash table is too large for the data that it is storing, then memory is being wasted. So, in addition to resizing, when the load factor gets too high, you should also resize when the load factor gets too low.

One way to know when to resize your hash table is to compute the load factor whenever an item is inserted or deleted into the hash table. If the load factor is too high or too low, then you need to resize.

We added a get_load_factor and resize method to calculate the load factor and resize the hash table with a new capacity when necessary.

Follow Along

Let's change our put method to resize when the load factor gets too high. Here's how our current put method looks:

To know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.

Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.

Fantastic, we did it!

Objective 02 - Define and compute the load factor of a hash table and implement a hash table that automatically resizes based on load factor

Overview

What does runtime look like with linked list chaining?

Load Factor

The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.

You can see that resizing is an expensive operation, so you don’t want to resize too often. However, when we average it out, hash tables are constant time (O(1)) even with resizing.

We added a get_load_factor and resize method to calculate the load factor and resize the hash table with a new capacity when necessary.

Follow Along

Let's change our put method to resize when the load factor gets too high. Here's how our current put method looks:

To know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.

Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.

Fantastic, we did it!

Challenge

Do we need to modify our delete and get methods to account for the new get_load_factor and resize methods? Why or why not?

Additional Resources

Homework:

Given a string text, you need to use the characters of text to form as many instances of the word "lambda" as possible.

You can use each character in text at most once.

Write a function that returns the maximum number of instances of "lambda" that can be formed.

Input: text = "mbxcdatlas"

Output: 1

Example 2:

Input: text = "lalaaxcmbdtsumbdav"

Output: 2

Example 3:

Input: text = "sctlamb"

Output: 0

Notes:

text consists of lowercase English characters only

[execution time limit] 4 seconds (py3)

[input] string text

[output] integer

D4- Module 04 - Searching and Recursion

Searching & Recursion

Objective 01 - Understand logarithms and recall the common cases where they come up in technical interviews

Overview

What is a logarithm?

Logarithms are a way of looking differently at exponentials. I know that this is a bit of a vague definition, so let's look at an example:

What does the mathematical expression above mean? It's an abbreviation for the following expression:

What we are looking at above is two different ways to express an object that doubles in size with each iteration.

Another way to think about 2^5 = 32 is that 2 is the "growth rate" and 5 is the number of times you went through the growth (doubling). 32 is the final result.

Let's look at a few more expressions:

Now, to begin looking at logarithms, let's rewrite the exponential expressions above in logarithmic form.

Notice how we have essentially just moved around different pieces of the expression.

For our first expression,

2 was the "growth rate", 5 was the "time" spent growing, and 32 was where we ended up. When we rewrite this logarithmically, we have

In this case, 2 still represents the "growth rate" and 32 still represents where we end up. The 5 also still represents the "time" spent growing.

So, the difference between when we would use a logarithm and when we use exponentiation depends on what factors we know ahead of time. If you know the growth rate and you know how long you are growing, you can use exponentiation (2^5) to figure out where you end up (32). However, if you know the growth rate and where you end up but do not know the time spent growing, you can use a logarithm (log_2 32) to figure that out.

Logarithms have an inverse relationship with exponents, just like division and multiplication have an inverse relationship.

For example, if you know that you have one group of 5 items and you want to identify the total you would have if you had 4 of those groups instead of just one, you could express that with 5 * 4 = 20. However, if you knew that you had a total of 20 items and you wanted to know how many groups of 5 you could make out your total, you could express that with 20 \ 5 = 4.

Follow Along

Why should I care? What does this have to do with programming and interview preparation?

In computer science, you often ask questions like "How many times must n be divided in half before we get to one?" or "How many times will we halve this collection before the collection has only one item?" To answer these questions, you can use logarithms! Halving is like doubling, so we can say that log_2 n would give us the answer we're seeking.

You will see this come up when analyzing the time complexity of specific algorithms. Any algorithm that doubles or halves a number or collection on each iteration of a loop is likely to have O(log n) time complexity. You will see this come up specifically when we talk about binary search and its time complexity. You will also see this come up in specific sorting algorithms (like merge sort). In simple terms, merge sort divides a collection in half and then merges the sorted halves. The fact that the algorithm repeatedly halves something is your clue that it includes a logarithm in its time complexity. One last place you're likely to see logarithms come up is with a perfect binary tree. One property of these binary trees is that the number of nodes doubles at each level.

Challenge

Write a logarithmic expression that is identical to this exponential expression:
Write an exponential expression that is identical to this logarithmic expression:
What keywords should you look out for that might alert you that logarithms are involved?

Additional Resources

Objective 02 - Write a linear search algorithm

Overview

Imagine that I've chosen a random number from 1 to 20. Then, you must guess the number. One approach would be to start picking at 1 and increment your guess by 1 with each guess. If the computer randomly selected 20, then it would take you 20 guesses to get the correct answer. If the computer guessed 1, then you would have the right answer on your very first guess. On average, you will get the correct answer on the 10th or 11th guess.

If the collection we are searching through is random and unsorted, linear search is the most efficient way to search through it. Once we have a sorted list, then there are more efficient approaches to use.

Follow Along

We want to write a simple program to conduct a linear search on a collection of data. Let's write a function that takes a list (arr) and an integer (target) as its input and returns the integer idx where the target is found. If the target does not exist in the arr, then the function should return -1.

Challenge

What Is Recursion?

The word recursion comes from the Latin word recurrere, meaning to run or hasten back, return, revert, or recur. Here are some online definitions of recursion:

Notice how the concept that is being defined, ancestors, shows up in its own definition. This is a recursive definition.

In programming, recursion has a very precise meaning. It refers to a coding technique in which a function calls itself.

Why Use Recursion?

Most programming problems are solvable without recursion. So, strictly speaking, recursion usually isn’t necessary.

However, some situations particularly lend themselves to a self-referential definition—for example, the definition of ancestors shown above. If you were devising an algorithm to handle such a case programmatically, a recursive solution would likely be cleaner and more concise.

On the other hand, recursion isn’t for every situation. Here are some other factors to consider:

For some problems, a recursive solution, though possible, will be awkward rather than elegant.
Recursive implementations often consume more memory than non-recursive ones.
In some cases, using recursion may result in slower execution time.

Typically, the readability of the code will be the biggest determining factor. But it depends on the circumstances. The examples presented below should help you get a feel for when you should choose recursion.

Recursion in Python

The same holds true if multiple instances of the same function are running concurrently. For example, consider the following definition:

When function() executes the first time, Python creates a namespace and assigns x the value 10 in that namespace. Then function() calls itself recursively. The second time function() runs, the interpreter creates a second namespace and assigns 10 to x there as well. These two instances of the name x are distinct from each another and can coexist without clashing because they are in separate namespaces.

As written, function() would in theory go on forever, calling itself over and over without any of the calls ever returning. In practice, of course, nothing is truly forever. Your computer only has so much memory, and it would run out eventually.

Technical note: You can find out what Python’s recursion limit is with a function from the sys module called getrecursionlimit():>>>

You can change it, too, with setrecursionlimit():>>>

You can set it to be pretty large, but you can’t make it infinite.

There isn’t much use for a function to indiscriminately call itself recursively without end. It’s reminiscent of the instructions that you sometimes find on shampoo bottles: “Lather, rinse, repeat.” If you were to follow these instructions literally, you’d shampoo your hair forever!

This logical flaw has evidently occurred to some shampoo manufacturers, because some shampoo bottles instead say “Lather, rinse, repeat as necessary.” That provides a termination condition to the instructions. Presumably, you’ll eventually feel your hair is sufficiently clean to consider additional repetitions unnecessary. Shampooing can then stop.

Similarly, a function that calls itself recursively must have a plan to eventually stop. Recursive functions typically follow this pattern:

There are one or more base cases that are directly solvable without the need for further recursion.
Each recursive call moves the solution progressively closer to a base case.

You’re now ready to see how this works with some examples.

Get Started: Count Down to Zero

The first example is a function called countdown(), which takes a positive number as an argument and prints the numbers from the specified argument down to zero:>>>

Notice how countdown() fits the paradigm for a recursive algorithm described above:

The base case occurs when n is zero, at which point recursion stops.
In the recursive call, the argument is one less than the current value of n, so each recursion moves closer to the base case.

Note: For simplicity, countdown() doesn’t check its argument for validity. If n is either a non-integer or negative, you’ll get a RecursionError exception because the base case is never reached.

The version of countdown() shown above clearly highlights the base case and the recursive call, but there’s a more concise way to express it:

Here’s one possible non-recursive implementation for comparison:>>>

This is a case where the non-recursive solution is at least as clear and intuitive as the recursive one, and probably more so.

Calculate Factorial

In other words, n! is the product of all integers from 1 to n, inclusive.

As with the example shown above, there are base cases that are solvable without recursion. The more complicated cases are reductive, meaning that they reduce to one of the base cases:

The base cases (n = 0 or n = 1) are solvable without recursion.
For values of n greater than 1, n! is defined in terms of (n - 1)!, so the recursive solution progressively approaches the base case.

The calculations of 4!, 3!, and 2! suspend until the algorithm reaches the base case where n = 1. At that point, 1! is computable without further recursion, and the deferred calculations run to completion.

Define a Python Factorial Function

Here’s a recursive Python function to calculate factorial. Note how concise it is and how well it mirrors the definition shown above:>>>

Notice how all the recursive calls stack up. The function gets called with n = 4, 3, 2, and 1 in succession before any of the calls return. Finally, when n is 1, the problem can be solved without any more recursion. Then each of the stacked-up recursive calls unwinds back out, returning 1, 2, 6, and finally 24 from the outermost call.

Again, this shows that if a problem is solvable with recursion, there will also likely be several viable non-recursive solutions as well. You’ll typically choose based on which one results in the most readable and intuitive code.

Another factor to take into consideration is execution speed. There can be significant performance differences between recursive and non-recursive solutions. In the next section, you’ll explore these differences a little further.

Speed Comparison of Factorial Implementations

timeit() first executes the commands contained in the specified <setup_string>. Then it executes <command> the given number of <iterations> and reports the cumulative execution time in seconds:>>>

Here, the setup parameter assigns string the value 'foobar'. Then timeit() prints string one hundred times. The total execution time is just over 3/100 of a second.

The examples shown below use timeit() to compare the recursive, iterative, and reduce() implementations of factorial from above. In each case, setup_string contains a setup string that defines the relevant factorial() function. timeit() then executes factorial(4) a total of ten million times and reports the aggregate execution.

First, here’s the recursive version:>>>

Next up is the iterative implementation:>>>

Last, here’s the version that uses reduce():>>>

In this case, the iterative implementation is the fastest, although the recursive solution isn’t far behind. The method using reduce() is the slowest. Your mileage will probably vary if you try these examples on your own machine. You certainly won’t get the same times, and you may not even get the same ranking.

Does it matter? There’s a difference of almost four seconds in execution time between the iterative implementation and the one that uses reduce(), but it took ten million calls to see it.

If you’ll be calling a function many times, you might need to take execution speed into account when choosing an implementation. On the other hand, if the function will run relatively infrequently, then the difference in execution times will probably be negligible. In that case, you’d be better off choosing the implementation that seems to express the solution to the problem most clearly.

For factorial, the timings recorded above suggest a recursive implementation is a reasonable choice.

Perhaps it might interest you to know how this performs in the timing test:>>>

Wow! math.factorial() performs better than the best of the other three implementations shown above by roughly a factor of 10.

A function implemented in C will virtually always be faster than a corresponding function implemented in pure Python.

Traverse a Nested List

Suppose you wanted to count the number of leaf elements in this list—the lowest-level str objects—as though you’d flattened out the list. The leaf elements are "Adam", "Bob", "Chet", "Cat", "Barb", "Bert", "Alex", "Bea", "Bill", and "Ann", so the answer should be 10.

Just calling len() on the list doesn’t give the correct answer:>>>

len() counts the objects at the top level of names, which are the three leaf elements "Adam", "Alex", and "Ann" and two sublists ["Bob", ["Chet", "Cat"], "Barb", "Bert"] and ["Bea", "Bill"]:>>>

What you need here is a function that traverses the entire list structure, sublists included. The algorithm goes something like this:

Walk through the list, examining each item in turn.
If you find a leaf element, then add it to the accumulated count.
If you encounter a sublist, then do the following:
- Drop down into that sublist and similarly walk through it.
- Once you’ve exhausted the sublist, go back up, add the elements from the sublist to the accumulated count, and resume the walk through the parent list where you left off.

Note the self-referential nature of this description: Walk through the list. If you encounter a sublist, then similarly walk through that list. This situation begs for recursion!

Traverse a Nested List Recursively

In the case of the names list, if an item is an instance of type list, then it’s a sublist. Otherwise, it’s a leaf item:>>>

Now you have the tools in place to implement a function that counts leaf elements in a list, accounting for sublists recursively:

If you run count_leaf_items() on several lists, including the names list defined above, you get this:>>>

Homework

Binary Search

Which logarithmic expression is identical to the following exponential expression?

2^n = 64 log_2(64)=n

item_list must be sorted from least to greatest.

Example

For n = 1, the output should be fibonacciSimpleSum2(n) = true.
Explanation: 1 = 0 + 1 = F~0~ + F~1~.
For n = 11, the output should be fibonacciSimpleSum2(n) = true.
Explanation: 11 = 3 + 8 = F~4~ + F~6~.
For n = 60, the output should be fibonacciSimpleSum2(n) = true.
Explanation: 60 = 5 + 55 = F~5~ + F~10~.
For n = 66, the output should be fibonacciSimpleSum2(n) = false.

Input/Output

[execution time limit] 4 seconds (py3)
[input] integer n
Guaranteed constraints: 1 ≤ n ≤ 2 - 10^9^.
[output] boolean
true if n can be represented as F~i~ + F~j~, false otherwise.

Given an integer array nums sorted in ascending order, and an integer target.

Suppose that nums is rotated at some pivot unknown to you beforehand (i.e., [1,2,3,4,5,6,7] might become [4,5,6,7,1,2,3]).

You should search for target in nums and if found return its index, otherwise return -1.

Example 1: Input: nums = [6,7,1,2,3,4,5], target = 1 Output: 2

Example 2: Input: nums = [6,7,1,2,3,4,5], target = 3 Output: 4

Example 3: Input: nums = [1], target = 2 Output: -1

Your solution should have better than O(n) time complexity over the number of items in the list. There is an O(log n) solution. There is also an O(1) solution.

Note:

1 <= nums.length < 100
1 <= nums[i] <= 100
All values of nums are unique.
Numbers from 1 up to the length of the list will be contained in the list.
[execution time limit] 4 seconds (py3)
[input] array.integer nums
[input] integer target
[output] integer

D4- Module 04 - Searching and Recursion

Searching & Recursion

Objective 01 - Understand logarithms and recall the common cases where they come up in technical interviews

Overview

What is a logarithm?

Logarithms are a way of looking differently at exponentials. I know that this is a bit of a vague definition, so let's look at an example:

What does the mathematical expression above mean? It's an abbreviation for the following expression:

What we are looking at above is two different ways to express an object that doubles in size with each iteration.

Another way to think about 2^5 = 32 is that 2 is the "growth rate" and 5 is the number of times you went through the growth (doubling). 32 is the final result.

Let's look at a few more expressions:

Now, to begin looking at logarithms, let's rewrite the exponential expressions above in logarithmic form.

Notice how we have essentially just moved around different pieces of the expression.

For our first expression,

2 was the "growth rate", 5 was the "time" spent growing, and 32 was where we ended up. When we rewrite this logarithmically, we have

In this case, 2 still represents the "growth rate" and 32 still represents where we end up. The 5 also still represents the "time" spent growing.

Logarithms have an inverse relationship with exponents, just like division and multiplication have an inverse relationship.

Follow Along

Why should I care? What does this have to do with programming and interview preparation?

Challenge

Write a logarithmic expression that is identical to this exponential expression:
Write an exponential expression that is identical to this logarithmic expression:
What keywords should you look out for that might alert you that logarithms are involved?

Additional Resources

Objective 02 - Write a linear search algorithm

Overview

Follow Along

def linear_search(arr, target):
    # loop through each item in the input array
    for idx in range(len(arr)):
        # check if the item at the current index is equal to the target
        if arr[idx] == target:
            # return the current index as the match
            return idx
    # if we were able to loop through the entire array, the target is not present
    return -1

Challenge

What Is Recursion?

The word recursion comes from the Latin word recurrere, meaning to run or hasten back, return, revert, or recur. Here are some online definitions of recursion:

The act or process of returning or running back
The act of defining an object (usually a function) in terms of that object itself
A method of defining a sequence of objects, such as an expression, function, or set, where some number of initial objects are given and each successive object is defined in terms of the preceding objects

A recursive definition is one in which the defined term appears in the definition itself. Self-referential situations often crop up in real life, even if they aren’t immediately recognizable as such. For example, suppose you wanted to describe the set of people that make up your ancestors. You could describe them this way:

Notice how the concept that is being defined, ancestors, shows up in its own definition. This is a recursive definition.

In programming, recursion has a very precise meaning. It refers to a coding technique in which a function calls itself.

Why Use Recursion?

Most programming problems are solvable without recursion. So, strictly speaking, recursion usually isn’t necessary.

Traversal of is another good example. Because these are nested structures, they readily fit a recursive definition. A non-recursive algorithm to walk through a nested structure is likely to be somewhat clunky, while a recursive solution will be relatively elegant. An example of this appears later in this tutorial.

On the other hand, recursion isn’t for every situation. Here are some other factors to consider:

For some problems, a recursive solution, though possible, will be awkward rather than elegant.
Recursive implementations often consume more memory than non-recursive ones.
In some cases, using recursion may result in slower execution time.

Recursion in Python

When you call a function in Python, the interpreter creates a new so that names defined within that function don’t with identical names defined elsewhere. One function can call another, and even if they both define objects with the same name, it all works out fine because those objects exist in separate namespaces.

The same holds true if multiple instances of the same function are running concurrently. For example, consider the following definition:

def function():
    x = 10
    function()

Unfortunately, running function() as it stands produces a result that is less than inspiring, as the following shows:>>>

>>> function()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in function
  File "<stdin>", line 3, in function
  File "<stdin>", line 3, in function
  [Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded

Python doesn’t allow that to happen. The interpreter limits the maximum number of times a function can call itself recursively, and when it reaches that limit, it raises a RecursionError , as you see above.

Technical note: You can find out what Python’s recursion limit is with a function from the sys module called getrecursionlimit():>>>

>>> from sys import getrecursionlimit
>>> getrecursionlimit()
1000

You can change it, too, with setrecursionlimit():>>>

>>> from sys import setrecursionlimit
>>> setrecursionlimit(2000)
>>> getrecursionlimit()
2000

You can set it to be pretty large, but you can’t make it infinite.

Similarly, a function that calls itself recursively must have a plan to eventually stop. Recursive functions typically follow this pattern:

There are one or more base cases that are directly solvable without the need for further recursion.
Each recursive call moves the solution progressively closer to a base case.

You’re now ready to see how this works with some examples.

Get Started: Count Down to Zero

The first example is a function called countdown(), which takes a positive number as an argument and prints the numbers from the specified argument down to zero:>>>

>>> def countdown(n):
...     print(n)
...     if n == 0:
...         return             # Terminate recursion
...     else:
...         countdown(n - 1)   # Recursive call
...

>>> countdown(5)
5
4
3
2
1
0

Notice how countdown() fits the paradigm for a recursive algorithm described above:

The base case occurs when n is zero, at which point recursion stops.
In the recursive call, the argument is one less than the current value of n, so each recursion moves closer to the base case.

The version of countdown() shown above clearly highlights the base case and the recursive call, but there’s a more concise way to express it:

def countdown(n):
    print(n)
    if n > 0:
        countdown(n - 1)

Here’s one possible non-recursive implementation for comparison:>>>

>>> def countdown(n):
...     while n >= 0:
...         print(n)
...         n -= 1
...

>>> countdown(5)
5
4
3
2
1
0

This is a case where the non-recursive solution is at least as clear and intuitive as the recursive one, and probably more so.

Calculate Factorial

The next example involves the mathematical concept of . The factorial of a positive integer n, denoted as n!, is defined as follows:

In other words, n! is the product of all integers from 1 to n, inclusive.

Factorial so lends itself to recursive definition that programming texts nearly always include it as one of the first examples. You can express the definition of n! recursively like this:

As with the example shown above, there are base cases that are solvable without recursion. The more complicated cases are reductive, meaning that they reduce to one of the base cases:

The base cases (n = 0 or n = 1) are solvable without recursion.
For values of n greater than 1, n! is defined in terms of (n - 1)!, so the recursive solution progressively approaches the base case.

For example, recursive computation of 4! looks like this:Recursive Calculation of 4!

Define a Python Factorial Function

Here’s a recursive Python function to calculate factorial. Note how concise it is and how well it mirrors the definition shown above:>>>

>>> def factorial(n):
...     return 1 if n <= 1 else n * factorial(n - 1)
...

>>> factorial(4)
24

A little embellishment of this function with some statements gives a clearer idea of the call and return sequence:>>>

>>> def factorial(n):
...     print(f"factorial() called with n = {n}")
...     return_value = 1 if n <= 1 else n * factorial(n -1)
...     print(f"-> factorial({n}) returns {return_value}")
...     return return_value
...

>>> factorial(4)
factorial() called with n = 4
factorial() called with n = 3
factorial() called with n = 2
factorial() called with n = 1
-> factorial(1) returns 1
-> factorial(2) returns 2
-> factorial(3) returns 6
-> factorial(4) returns 24
24

Recursion isn’t necessary here. You could implement factorial() iteratively using a loop:>>>

>>> def factorial(n):
...     return_value = 1
...     for i in range(2, n + 1):
...         return_value *= i
...     return return_value
...

>>> factorial(4)
24

You can also implement factorial using Python’s , which you can from the functools module:>>>

>>> from functools import reduce
>>> def factorial(n):
...     return reduce(lambda x, y: x * y, range(1, n + 1) or [1])
...

>>> factorial(4)
24

Speed Comparison of Factorial Implementations

To evaluate execution time, you can use a function called from a module that is also called timeit. This function supports a number of different formats, but you’ll use the following format in this tutorial:

timeit(<command>, setup=<setup_string>, number=<iterations>)

>>> from timeit import timeit

>>> timeit("print(string)", setup="string='foobar'", number=100)
foobar
foobar
foobar
   .
   . [100 repetitions]
   .
foobar
0.03347089999988384

Here, the setup parameter assigns string the value 'foobar'. Then timeit() prints string one hundred times. The total execution time is just over 3/100 of a second.

First, here’s the recursive version:>>>

>>> setup_string = """
... print("Recursive:")
... def factorial(n):
...     return 1 if n <= 1 else n * factorial(n - 1)
... """

>>> from timeit import timeit
>>> timeit("factorial(4)", setup=setup_string, number=10000000)
Recursive:
4.957105500000125

Next up is the iterative implementation:>>>

>>> setup_string = """
... print("Iterative:")
... def factorial(n):
...     return_value = 1
...     for i in range(2, n + 1):
...         return_value *= i
...     return return_value
... """

>>> from timeit import timeit
>>> timeit("factorial(4)", setup=setup_string, number=10000000)
Iterative:
3.733752099999947

Last, here’s the version that uses reduce():>>>

>>> setup_string = """
... from functools import reduce
... print("reduce():")
... def factorial(n):
...     return reduce(lambda x, y: x * y, range(1, n + 1) or [1])
... """

>>> from timeit import timeit
>>> timeit("factorial(4)", setup=setup_string, number=10000000)
reduce():
8.101526299999932

Does it matter? There’s a difference of almost four seconds in execution time between the iterative implementation and the one that uses reduce(), but it took ten million calls to see it.

For factorial, the timings recorded above suggest a recursive implementation is a reasonable choice.

Frankly, if you’re coding in Python, you don’t need to implement a factorial function at all. It’s already available in the standard :>>>

>>> from math import factorial
>>> factorial(4)
24

Perhaps it might interest you to know how this performs in the timing test:>>>

>>> setup_string = "from math import factorial"

>>> from timeit import timeit
>>> timeit("factorial(4)", setup=setup_string, number=10000000)
0.3724050999999946

Wow! math.factorial() performs better than the best of the other three implementations shown above by roughly a factor of 10.

Technical note: The fact that math.factorial() is so much speedier probably has nothing to do with whether it’s implemented recursively. More likely it’s because the function is implemented in rather than Python. For more reading on Python and C, see these resources:

A function implemented in C will virtually always be faster than a corresponding function implemented in pure Python.

Traverse a Nested List

The next example involves visiting each item in a nested list structure. Consider the following Python :

names = [
    "Adam",
    [
        "Bob",
        [
            "Chet",
            "Cat",
        ],
        "Barb",
        "Bert"
    ],
    "Alex",
    [
        "Bea",
        "Bill"
    ],
    "Ann"
]

As the following diagram shows, names contains two sublists. The first of these sublists itself contains another sublist:

Just calling len() on the list doesn’t give the correct answer:>>>

>>> len(names)
5

>>> for index, item in enumerate(names):
...     print(index, item)
...
0 Adam
1 ['Bob', ['Chet', 'Cat'], 'Barb', 'Bert']
2 Alex
3 ['Bea', 'Bill']
4 Ann

What you need here is a function that traverses the entire list structure, sublists included. The algorithm goes something like this:

Walk through the list, examining each item in turn.
If you find a leaf element, then add it to the accumulated count.
If you encounter a sublist, then do the following:
- Drop down into that sublist and similarly walk through it.
- Once you’ve exhausted the sublist, go back up, add the elements from the sublist to the accumulated count, and resume the walk through the parent list where you left off.

Note the self-referential nature of this description: Walk through the list. If you encounter a sublist, then similarly walk through that list. This situation begs for recursion!

Traverse a Nested List Recursively

Recursion fits this problem very nicely. To solve it, you need to be able to determine whether a given list item is leaf item or not. For that, you can use the built-in Python function .

In the case of the names list, if an item is an instance of type list, then it’s a sublist. Otherwise, it’s a leaf item:>>>

>>> names
['Adam', ['Bob', ['Chet', 'Cat'], 'Barb', 'Bert'], 'Alex', ['Bea', 'Bill'], 'Ann']

>>> names[0]
'Adam'
>>> isinstance(names[0], list)
False

>>> names[1]
['Bob', ['Chet', 'Cat'], 'Barb', 'Bert']
>>> isinstance(names[1], list)
True

>>> names[1][1]
['Chet', 'Cat']
>>> isinstance(names[1][1], list)
True

>>> names[1][1][0]
'Chet'
>>> isinstance(names[1][1][0], list)
False

Now you have the tools in place to implement a function that counts leaf elements in a list, accounting for sublists recursively:

def count_leaf_items(item_list):
    """Recursively counts and returns the
       number of leaf items in a (potentially
       nested) list.
    """
    count = 0
    for item in item_list:
        if isinstance(item, list):
            count += count_leaf_items(item)
        else:
            count += 1

    return count

If you run count_leaf_items() on several lists, including the names list defined above, you get this:>>>

>>> count_leaf_items([1, 2, 3, 4])
4
>>> count_leaf_items([1, [2.1, 2.2], 3])
4
>>> count_leaf_items([])
0

>>> count_leaf_items(names)
10
>>> # Success!

As with the factorial example, adding some statements helps to demonstrate the sequence of recursive calls and values:

 1def count_leaf_items(item_list):
 2    """Recursively counts and returns the
 3       number of leaf items in a (potentially
 4       nested) list.
 5    """
 6    print(f"List: {item_list}")
 7    count = 0
 8    for item in item_list:
 9        if isinstance(item, list):
10            print("Encountered sublist")
11            count += count_leaf_items(item)
12        else:
13            print(f"Counted leaf item \"{item}\"")
14            count += 1
15
16    print(f"-> Returning count {count}")
17    return count

Homework

Binary Search

Which logarithmic expression is identical to the following exponential expression?

2^n = 64 log_2(64)=n

item_list must be sorted from least to greatest.

For a given positive integer n determine if it can be represented as a sum of two (possibly equal).

Example

For n = 1, the output should be fibonacciSimpleSum2(n) = true.
Explanation: 1 = 0 + 1 = F~0~ + F~1~.
For n = 11, the output should be fibonacciSimpleSum2(n) = true.
Explanation: 11 = 3 + 8 = F~4~ + F~6~.
For n = 60, the output should be fibonacciSimpleSum2(n) = true.
Explanation: 60 = 5 + 55 = F~5~ + F~10~.
For n = 66, the output should be fibonacciSimpleSum2(n) = false.

Input/Output

[execution time limit] 4 seconds (py3)
[input] integer n
Guaranteed constraints: 1 ≤ n ≤ 2 - 10^9^.
[output] boolean
true if n can be represented as F~i~ + F~j~, false otherwise.

def fibonacciSimpleSum2(n):
    # if 0 is less than n and n is less than 5 then we know we can return
    # true because n will be 1-4 which can be created with 2 fib numbers
    if 0 < n < 5:
        return True

    # first get fibonacci sequence up to n
    seq = [0, 1]
    # starting from 2 and ending at n
    for i in range(2, n):
        # add seq at i - 2 (0 to start) and seq at i - 1 (1 to start)
        fib = seq[i - 2] + seq[i - 1]
        # if n is greater than fib
        if n >= fib:
            # we can append fib to the sequence
            seq.append(fib)
            # if fib is greater than or equal to n we can stop
        else:
            break
    print(seq)

    # The check I googled
    # for i, number in enumerate(seq[:-1]):
    #     paired = n - number
    #     if paired in seq[i + 1:]:
    #         return True

    # check if any 2 of the numbers in seq add up to n
    # My check
    for i in range(len(seq) - 1):  # O(n^2)
        j = 0
        while (seq[i] + seq[j]) != n:
            if j == len(seq) - 1:
                break
            else:
                j += 1
        if seq[i] + seq[j] == n:
            return True

    return False


print(fibonacciSimpleSum2(5))

Given an integer array nums sorted in ascending order, and an integer target.

Suppose that nums is rotated at some pivot unknown to you beforehand (i.e., [1,2,3,4,5,6,7] might become [4,5,6,7,1,2,3]).

You should search for target in nums and if found return its index, otherwise return -1.

Example 1: Input: nums = [6,7,1,2,3,4,5], target = 1 Output: 2

Example 2: Input: nums = [6,7,1,2,3,4,5], target = 3 Output: 4

Example 3: Input: nums = [1], target = 2 Output: -1

Your solution should have better than O(n) time complexity over the number of items in the list. There is an O(log n) solution. There is also an O(1) solution.

Note:

1 <= nums.length < 100
1 <= nums[i] <= 100
All values of nums are unique.
Numbers from 1 up to the length of the list will be contained in the list.
[execution time limit] 4 seconds (py3)
[input] array.integer nums
[input] integer target
[output] integer

def csSearchRotatedSortedArray(nums, target):
    min = 0
    max = len(nums) - 1

    while not max < min:
        guess = (max + min) // 2
        # print(f'min: {nums[min]} max: {nums[max]} guess:{nums[guess]} target:'
        #       f' {target}')
        # if the guess is the target we got it and return the guess
        if nums[guess] == target:
            # print('guessed the target')
            return guess
        # if min is less than or equal to the guess
        elif nums[min] <= nums[guess]:
            # print('min less than guess')
            # if min is less than or equal to the target and less than the guess
            if nums[min] <= target < nums[guess]:
                # print('min less than or equal to target and less than guess')
                # we can set max to the guess because nothing past the guess
                # can be the target
                max = guess
                # else we can set min to guess + 1 because nothing before it
                # can be the target
            else:
                # print('min is greater than target and greater than or equal '
                #       'to guess')
                min = guess + 1
        # else if min is greater than the guess
        else:
            print('min is greater than or equal to guess')
            # if max - 1 is greater than the target and greater than the guess
            if nums[max - 1] >= target > nums[guess]:
                # print('max - 1 greater than or equal to target and greater '
                #       'than guess')
                # we can set min to guess plus one because nothing before it
                # can be the target
                min = guess + 1
            else:
                # print('max -1 less than target and less than or equal to guess')
                # else we set max equal to guess because nothing after it can
                # be the target
                max = guess

    return -1C:\Lambda\CIRRICULUMN_NOTES\CS-python-notes_WEEKS\wk18\d4\homework\2021-09-09-14-03-46.png

D3-Module 03 - Hash Tables II

Number Bases & Characters

Original:

Objective 01 - Understand hash collisions and use a linked list for collision resolution in a user-defined Hashable class

Overview

Remember when we wondered what would happen if multiple keys hashed to the same index, and we said that we would worry about it later? Whelp, it's later 🤪.

Ok, sounds ideal? But how does this work in code? Let's write some of it together.

Follow Along

Below is a partially filled out hash table class where we will be using HashTableEntry as our chain entries.

Take a look at the code below.

class HashTableEntry:
    """
    Hash table key/value pair to go in our collision chain
    """
    def __init__(self, key, value):
        self.key = key
        self.value = value

# Hash table can't have fewer than this many slots
MIN_CAPACITY = 8

class HashTable:
    """
    A hash table with `capacity` buckets
    that accepts string keys
    Implement this.
    """

    def __init__(self, capacity):
        self.capacity = capacity  # Number of buckets in the hash table

        self.storage = []
        for _ in range(capacity):   # Initialize with empty lists
            self.storage.append([])

        self.item_count = 0

    def get_num_slots(self):
        """
        Return the length of the list you're using to hold the hash table data. (Not the number of items stored in the hash table,
        but the number of slots in the main list.)
        One of the tests relies on this.
        Implement this.
        """
        # Your code here

    def get_load_factor(self):
        """
        Return the load factor for this hash table.
        Implement this.
        """
        return len(self.storage)

    def djb2(self, key):
        """
        DJB2 hash, 32-bit
        Implement this, and/or FNV-1.
        """
        str_key = str(key).encode()

        hash = FNV_offset_basis_64

        for b in str_key:
            hash *= FNV_prime_64
            hash ^= b
            hash &= 0xffffffffffffffff  # 64-bit hash

        return hash

    def hash_index(self, key):
        """
        Take an arbitrary key and return a valid integer index between within the hash table's storage capacity.
        """
        return self.djb2(key) % self.capacity

    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        # Your code here

    def delete(self, key):
        """
        Remove the value stored with the given key.
        Print a warning if the key is not found.
        Implement this.
        """
        # Your code here

    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        Implement this.
        """
        # Your code here

Let's implement the put method with collision resolution by chaining. What are the two cases we need to handle?

There are no entries at the index. Great! We can initialize the entry to a list with the new HashTableEntry in it.
There are multiple entries at the index. We need to check every entry in the chain. If the key in one of the entries is equal to the key we are passing in, we need to replace it. For instance, let's say we pass in ("Ryan", 12), and then we later pass in ("Ryan", 15). We would need to replace "Ryan"'s old value with 15. If there are no entries that match, we create a new entry at the end of the chain.

Ok, that might sound confusing. Let's start breaking it down into code.

    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        # Your code here

First, we need to hash the key and start with the first entry at that index.

    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)

        chain = self.storage[index]

Next, we need to go through the chain. We need to check two conditions:

The current entry is not empty.
The key or the current entry is not equal to the key we are passing in.

    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)

        chain = self.storage[index]

        existing_entry = None

        for current_entry in chain:
            if current_entry.key == key:
                exiting_entry = current_entry
                break

Sweet! Now we need to check what happens when the loop breaks. It would only break for two reasons:

We reached an entry with the same key and need to replace the value.
We reached the end of the chain and need to create a new entry.

Let's write that in code!

    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)

        chain = self.storage[index]

        existing_entry = None

        for current_entry in chain:
            if current_entry.key == key:
                existing_entry = current_entry
                break

        if existing_entry is not None:
            existing_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            chain.append(new_entry)

Great! We created the put method.

Challenge

Objective 02 - Define and compute the load factor of a hash table and implement a hash table that automatically resizes based on load factor

Overview

What does runtime look like with linked list chaining?

Load Factor

The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.

You can see that resizing is an expensive operation, so you don’t want to resize too often. However, when we average it out, hash tables are constant time (O(1)) even with resizing.

We added a get_load_factor and resize method to calculate the load factor and resize the hash table with a new capacity when necessary.

class HashTableEntry:
    """
    Linked List hash table key/value pair
    """
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None

# Hash table can't have fewer than this many slots
MIN_CAPACITY = 8

class HashTable:
    """
    A hash table with `capacity` buckets
    that accepts string keys
    Implement this.
    """

    def __init__(self, capacity):
        self.capacity = capacity  # Number of buckets in the hash table
        self.storage = [None] * capacity
        self.item_count = 0

    def get_num_slots(self):
        """
        Return the length of the list you're using to hold the hash
        table data. (Not the number of items stored in the hash table,
        but the number of slots in the main list.)
        One of the tests relies on this.
        Implement this.
        """
        # Your code here

    def get_load_factor(self):
        """
        Return the load factor for this hash table.
        Implement this.
        """
        return self.item_count / self.capacity

    def resize(self, new_capacity):
        """
        Changes the capacity of the hash table and
        rehashes all key/value pairs.
        Implement this.
        """
        old_storage = self.storage
        self.capacity = new_capacity
        self.storage = [None] * self.capacity

        current_entry = None

        # Save this because put adds to it, and we don't want that.
        # It might be less hackish to pass a flag to put indicating that
        # we're in a resize and don't want to modify item count.
        old_item_count = self.item_count

        for bucket_item in old_storage:
            current_entry = bucket_item
            while current_entry is not None:
                self.put(current_entry.key, current_entry.value)
                current_entry = current_entry.next

        # Restore this to the correct number
        self.item_count = old_item_count

    def djb2(self, key):
        """
        DJB2 hash, 32-bit
        Implement this, and/or FNV-1.
        """
        str_key = str(key).encode()

        hash = FNV_offset_basis_64

        for b in str_key:
            hash *= FNV_prime_64
            hash ^= b
            hash &= 0xffffffffffffffff  # 64-bit hash

        return hash

    def hash_index(self, key):
        """
        Take an arbitrary key and return a valid integer index
        within the hash table's storage capacity.
        """
        return self.djb2(key) % self.capacity

    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)

        current_entry = self.storage[index]

        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next

        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry

    def delete(self, key):
        """
        Remove the value stored with the given key.
        Print a warning if the key is not found.
        Implement this.
        """
        # Your code here

    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        Implement this.
        """
        # Your code here

Follow Along

Let's change our put method to resize when the load factor gets too high. Here's how our current put method looks:

def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)

    current_entry = self.storage[index]

    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next

    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry

To know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.

def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)

    current_entry = self.storage[index]

    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next

    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry

        self.item_count += 1

Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.

def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)

    current_entry = self.storage[index]

    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next

    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry

        self.item_count += 1

        if self.get_load_factor() > 0.7:
            self.resize(self.capacity * 2)

Fantastic, we did it!

Objective 02 - Define and compute the load factor of a hash table and implement a hash table that automatically resizes based on load factor

Overview

What does runtime look like with linked list chaining?

Load Factor

The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.

You can see that resizing is an expensive operation, so you don’t want to resize too often. However, when we average it out, hash tables are constant time (O(1)) even with resizing.

We added a get_load_factor and resize method to calculate the load factor and resize the hash table with a new capacity when necessary.

class HashTableEntry:
    """
    Linked List hash table key/value pair
    """
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None

# Hash table can't have fewer than this many slots
MIN_CAPACITY = 8

class HashTable:
    """
    A hash table with `capacity` buckets
    that accepts string keys
    Implement this.
    """

    def __init__(self, capacity):
        self.capacity = capacity  # Number of buckets in the hash table
        self.storage = [None] * capacity
        self.item_count = 0

    def get_num_slots(self):
        """
        Return the length of the list you're using to hold the hash
        table data. (Not the number of items stored in the hash table,
        but the number of slots in the main list.)
        One of the tests relies on this.
        Implement this.
        """
        # Your code here

    def get_load_factor(self):
        """
        Return the load factor for this hash table.
        Implement this.
        """
        return self.item_count / self.capacity

    def resize(self, new_capacity):
        """
        Changes the capacity of the hash table and
        rehashes all key/value pairs.
        Implement this.
        """
        old_storage = self.storage
        self.capacity = new_capacity
        self.storage = [None] * self.capacity

        current_entry = None

        # Save this because put adds to it, and we don't want that.
        # It might be less hackish to pass a flag to put indicating that
        # we're in a resize and don't want to modify item count.
        old_item_count = self.item_count

        for bucket_item in old_storage:
            current_entry = bucket_item
            while current_entry is not None:
                self.put(current_entry.key, current_entry.value)
                current_entry = current_entry.next

        # Restore this to the correct number
        self.item_count = old_item_count

    def djb2(self, key):
        """
        DJB2 hash, 32-bit
        Implement this, and/or FNV-1.
        """
        str_key = str(key).encode()

        hash = FNV_offset_basis_64

        for b in str_key:
            hash *= FNV_prime_64
            hash ^= b
            hash &= 0xffffffffffffffff  # 64-bit hash

        return hash

    def hash_index(self, key):
        """
        Take an arbitrary key and return a valid integer index
        within the hash table's storage capacity.
        """
        return self.djb2(key) % self.capacity

    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)

        current_entry = self.storage[index]

        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next

        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry

    def delete(self, key):
        """
        Remove the value stored with the given key.
        Print a warning if the key is not found.
        Implement this.
        """
        # Your code here

    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        Implement this.
        """
        # Your code here

Follow Along

Let's change our put method to resize when the load factor gets too high. Here's how our current put method looks:

def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)

    current_entry = self.storage[index]

    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next

    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry

To know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.

def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)

    current_entry = self.storage[index]

    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next

    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry

        self.item_count += 1

Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.

def put(self, key, value):
    """
    Store the value with the given key.
    Hash collisions should be handled with Linked List Chaining.
    Implement this.
    """
    index = self.hash_index(key)

    current_entry = self.storage[index]

    while current_entry is not None and current_entry.key != key:
        current_entry = current_entry.next

    if current_entry is not None:
        current_entry.value = value
    else:
        new_entry = HashTableEntry(key, value)
        new_entry.next = self.storage[index]
        self.storage[index] = new_entry

        self.item_count += 1

        if self.get_load_factor() > 0.7:
            self.resize(self.capacity * 2)

Fantastic, we did it!

Challenge

Do we need to modify our delete and get methods to account for the new get_load_factor and resize methods? Why or why not?

Additional Resources

Homework:

Given a string text, you need to use the characters of text to form as many instances of the word "lambda" as possible.

You can use each character in text at most once.

Write a function that returns the maximum number of instances of "lambda" that can be formed.

Input: text = "mbxcdatlas"

Output: 1

Example 2:

Input: text = "lalaaxcmbdtsumbdav"

Output: 2

Example 3:

Input: text = "sctlamb"

Output: 0

Notes:

text consists of lowercase English characters only

[execution time limit] 4 seconds (py3)

[input] string text

[output] integer


def csMaxNumberOfLambdas(text):
    sub_string = "lambda"
    lambda_count = {"l": 0, "a": 0, "m": 0, "b": 0, "d": 0, "a": 0}
    counts = []
    for letter in text:
        if letter in lambda_count:
            lambda_count[letter] += 1
    for key, value in lambda_count.items():
        counts.append(value)
    return min(counts)