arrow-left

All pages
gitbookPowered by GitBook
1 of 9

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

wk18

wk17chevron-rightOutline-W-18chevron-rightD1- Module 01 - Number Bases and Character Encodingchevron-rightD2- Module 02 - Hash Tables Ichevron-rightD3-Module 03 - Hash Tables IIchevron-rightD4- Module 04 - Searching and Recursionchevron-rightwk19chevron-rightwk20chevron-right

hashtag
Navigation

Website Version

hashtag
Table of contents

Curriculum

Utilities

practice

Resources

quick-reference

Python-Docs

MISC

Interview Prep

Installations Setup & Env

Aux-Exploration

homeworkarrow-up-right

  • D1-Module 01 - Python Iarrow-up-right

    • Install Pythonarrow-up-right

  • D2- Module 02 - Python IIarrow-up-right

  • D3- Module 03 - Python IIIarrow-up-right

  • D4-Module 04 - Python IVarrow-up-right

  • wk18arrow-up-right

    • Outline-W-18arrow-up-right

    • D1- Module 01 - Number Bases and Character Encodingarrow-up-right

  • wk19arrow-up-right

    • Outline-W-19arrow-up-right

    • D1- Module 01 - Linked Listsarrow-up-right

  • wk20arrow-up-right

    • Outline-W-20arrow-up-right

    • D1-Graphs Iarrow-up-right

  • Utilitiesarrow-up-right

    • YouTubearrow-up-right

    • Code Lab Notebook Embeds From Lecturearrow-up-right

    Abstract Data Structures:arrow-up-right

    • Untitledarrow-up-right

    • Industry Standard Algorithmsarrow-up-right

    Functionsarrow-up-right

  • Intro To Python w Jupyter Notebooksarrow-up-right

  • Calculating Big Oarrow-up-right

  • Python Cheat Sheetarrow-up-right

  • Code Signal CGA Sprint Resourcesarrow-up-right

  • YouTubearrow-up-right

  • Useful Linksarrow-up-right

    • My-Linksarrow-up-right

    • Beginners Guide To Pythonarrow-up-right

  • Python Glossaryarrow-up-right
  • indexarrow-up-right

  • List Of Python Cheat Sheetsarrow-up-right

  • Docsarrow-up-right
    • String-Methodsarrow-up-right

  • Built In Functionsarrow-up-right

  • Listsarrow-up-right

    • Examplesarrow-up-right

  • Dictionariesarrow-up-right

  • Classesarrow-up-right

    • Python Objects & Classesarrow-up-right

    • indexarrow-up-right

  • Queue & Stacksarrow-up-right

  • Python VS JavaScriptarrow-up-right

  • Python Modules & Python Packagesarrow-up-right

  • Miscarrow-up-right

  • Python's Default Argument Values and Listsarrow-up-right

  • SCRAParrow-up-right

  • 150 Practice Problems & Solutionsarrow-up-right

    OS Modulearrow-up-right

    Homearrow-up-right
    Navigationarrow-up-right
    Outlinearrow-up-right
    wk17arrow-up-right
    Outline-w17arrow-up-right
    Code lab Notebooksarrow-up-right
    Repl.ITarrow-up-right
    Trinketarrow-up-right
    Supplemental Practice:arrow-up-right
    Random Examplesarrow-up-right
    Promptsarrow-up-right
    Python VS JavaScriptarrow-up-right
    Misc. Resourcesarrow-up-right
    Things To Internalize:arrow-up-right
    Python Snippetsarrow-up-right
    Python Module Index:arrow-up-right
    Useful Infoarrow-up-right
    Basic Syntaxarrow-up-right
    Values Expressions & Statmentsarrow-up-right
    Python Standard Library (STDLIB)arrow-up-right
    Unsorted Examplesarrow-up-right
    Outlinearrow-up-right
    About Pythonarrow-up-right
    Interview Resourcesarrow-up-right
    How to Write an Effective Resume of Python Developerarrow-up-right
    Interview Checklistarrow-up-right
    python-setuparrow-up-right
    Subjectarrow-up-right
    List Directory Contentsarrow-up-right
    Employee Managerarrow-up-right

    Outline-W-18

    hashtag
    My Notion Notes:

    hashtag
    Overview

    In this sprint, you will learn about number bases, character encoding, and how values are stored in memory. We'll also talk all about hash tables, a very powerful data structure for getting more performance and flexibility out of your algorithms. Finally, we'll look at optimizing searching through data, as well as the powerful programming technique of recursion.

    hashtag
    Number Bases and Character Encoding

    This module will teach about RAM, number bases, fixed-width integers, how arrays are stored in memory, and character encoding. These basics allow you to better understand how other data structures are stored in memory and have an intuitive sense of the time complexity of specific data structures' operations.

    hashtag
    Hash Tables I

    In this module, you will learn about the properties of hash tables, hash functions, and how to implement a hash table in Python. Hash tables are arguably one of the most important and common data structures you use and encounter to solve algorithmic coding challenges.

    hashtag
    Hash Tables II

    In this module, you will learn about hash collisions and implement a complete hash table in Python that takes collisions into account.

    hashtag
    Searching and Recursion

    This module will teach about logarithms, linear search, binary search, and recursion. This material sets the stage for learning about traversal techniques of trees and graphs. Additionally, using a binary search is a common optimization technique that may come up during a technical interview.

    Hash Table / Hash Map In Python:

    hashtag
    Hash Table and Hashmap in Python

    Data requires a number of ways in which it can be stored and accessed. One of the most important implementations includes Hash Tables. In Python, these Hash tables are implemented through the built-in data type i.e, dictionary. In this article, you will learn what are Hash Tables and Hashmaps in Python and how you can implement them using dictionaries.

    Before moving ahead, let us take a look at all the topics of discussion:

    D1- Module 01 - Number Bases and Character Encoding

    hashtag
    Objective 01 - Understand random access memory (RAM) as it relates to data structures

    hashtag

    Merge Sortarrow-up-right

  • Insertion Sortarrow-up-right

  • D2- Module 02 - Hash Tables Iarrow-up-right
    D3-Module 03 - Hash Tables IIarrow-up-right
    D4- Module 04 - Searching and Recursionarrow-up-right
    D2- Module 02 - Queues and Stacksarrow-up-right
    D3- Module 03 - Binary Search Treesarrow-up-right
    D4- Module 04 - Tree Traversalarrow-up-right
    D2-Graphs 2arrow-up-right
    DFSarrow-up-right
    D4arrow-up-right
    Interview Practice Resourcesarrow-up-right
    Overflow Practice Problemsarrow-up-right
    Arrayarrow-up-right
    Extra-Arrayarrow-up-right
    Stackarrow-up-right
    Queuearrow-up-right
    Queue Sandboxarrow-up-right
    Binary Searcharrow-up-right
    BST Explainedarrow-up-right
    Binary Treearrow-up-right
    Binary Search Treearrow-up-right
    BST Insertarrow-up-right
    Recursionarrow-up-right
    Recursion Explainedarrow-up-right
    Recursion Examplesarrow-up-right
    Hash Tablearrow-up-right
    Linked Listarrow-up-right
    Double Linked Listarrow-up-right
    List Operationsarrow-up-right
    Sortingarrow-up-right
    SelectionSortarrow-up-right
    Quick Sortarrow-up-right
    Searchingarrow-up-right
    Graphsarrow-up-right
    Exoticarrow-up-right

    What is a Hash table or a Hashmap in Python?

  • Hash table vs Hashmap

  • Creating Dictionaries

  • Creating Nested Dictionaries

  • Performing Operations on Hash Tables using dictionaries

  • Accessing Values

  • Updating Values

  • Deleting Items

  • Converting a Dictionary into a Dataframe

  • In computer science, a Hash table or a Hashmap is a type of data structure that maps keys to its value pairs (implement abstract array data types). It basically makes use of a function that computes an index value that in turn holds the elements to be searched, inserted, removed, etc. This makes it easy and fast to access data. In general, hash tables store key-value pairs and the key is generated using a hash function.

    Hash tables or has maps in Python are implemented through the built-in dictionary data type. The keys of a dictionary in Python are generated by a hashing function. The elements of a dictionary are not ordered and they can be changed.

    An example of a dictionary can be a mapping of employee names and their employee IDs or the names of students along with their student IDs.

    Moving ahead, let’s see the difference between the hash table and hashmap in Python.

    Dictionaries can be created in two ways:

    • Using curly braces ({})

    • Using the dict() function

    hashtag
    Using curly braces:

    Dictionaries in Python can be created using curly braces as follows:

    EXAMPLE:

    OUTPUT:

    {‘Dave’: ‘001’, ‘Ava’: ‘002’, ‘Joe’: ‘003’} dict

    hashtag
    Using dict() function:

    Python has a built-in function, dict() that can be used to create dictionariesarrow-up-right in Python. For example:

    EXAMPLE:

    OUTPUT:

    {} dict

    In the above example, an empty dictionary is created since no key-value pairs are supplied as a parameter to the dict() function. In case you want to add values, you can do as follows:

    EXAMPLE:

    OUTPUT:

    {‘Dave’: ‘001’, ‘Ava’: ‘002’, ‘Joe’: ‘003’} dict

    Nested dictionaries are basically dictionaries that lie within other dictionaries. For example:

    EXAMPLE:

    There are a number of operations that can be performed on has tables in Python through dictionaries such as:

    • Accessing Values

    • Updating Values

    • Deleting Element

    hashtag
    Accessing Values:

    The values of a dictionary can be accessed in many ways such as:

    • Using key values

    • Using functions

    • Implementing the for loop

    hashtag
    Using key values:

    Dictionary values can be accessed using the key values as follows:

    EXAMPLE:

    OUTPUT: ‘ 001′

    hashtag
    Using functions:

    There are a number of built-in functions that can be used such as get(), keys(), values(), etc.

    EXAMPLE:

    OUTPUT:

    dict_keys([‘Dave’, ‘Ava’, ‘Joe’]) dict_values([‘001’, ‘002’, ‘003’]) 001

    hashtag
    Implementing the for loop:

    The for loop allows you to access the key-value pairs of a dictionary easily by iterating over them. For example:

    OUTPUT:

    All keys Dave Ava Joe All values 001 002 003 All keys and values Dave : 001 Ava : 002 Joe : 003

    Dictionaries are mutable data types and therefore, you can update them as and when required. For example, if I want to change the ID of the employee named Dave from ‘001’ to ‘004’ and if I want to add another key-value pair to my dictionary, I can do as follows:

    EXAMPLE:

    OUTPUT: {‘Dave’: ‘004’, ‘Ava’: ‘002’, ‘Joe’: ‘003’, ‘Chris’: ‘005’}

    There a number of functions that allow you to delete items from a dictionary such as del(), pop(), popitem(), clear(), etc. For example:

    EXAMPLE:

    OUTPUT: {‘Joe’: ‘003’}

    The above output shows that all the elements except ‘Joe: 003’ have been removed from the dictionary using the various functions.

    As you have seen previously, I have created a nested dictionary containing employee names and their details mapped to it. Now to make a clear table out of that, I will make use of the pandas library in order to put everything as a dataframe.

    EXAMPLE:

    OUTPUT:

    Overview

    Your computer has something called random access memory (RAM). Sometimes, people say "memory" when referring to RAM.

    hashtag
    Follow Along

    One thing that might come to your mind is that there are different types of memory on your computer. What about storing things like videos, text documents, and applications? Are those in "memory"? There is a distinction between "storage" and "memory". Things like videos and files are stored on a disc, not in RAM. RAM is faster than disc storage, but there isn't as much space available. Disc storage has more space, but it is slower.

    Think of RAM like a set of numbered, sequential mailboxes. Just like a set of mailboxes with numbered addresses, RAM is also sequential and has numbered addresses.

    Now, just like you can put something in a mailbox, you can also put something in RAM. Things that you put in RAM, we can call variables. Each "box" in RAM has an address.

    Each one of the "boxes" (memory addresses) in our set of mailboxes (RAM) holds 8 bits. You can think of each bit like a tiny switch that can either be "on" or "off." "On" is represented by a 1, and "off" is represented by a 0.

    Bits are often thought about in groups. A group of 8 bits is called a byte. Each "box" in RAM can hold 1 byte (8 bits).

    Now, a computer has more than just disc storage and RAM inside of it. There is also a processor. And, in between the processor and the RAM is something called a memory controller. The memory controller can access each box in RAM directly. It is as if the memory controller had tubes connected to each box of the set of mailboxes. Through those tubes, the memory controller can send and receive information directly to each box in RAM.

    Why is the direct connection between the memory controller and each box in RAM meaningful? It's so that the memory controller can jump around to which box it wants to communicate with quickly. Even though the boxes are in sequential order, the memory controller doesn't have to go through the boxes in order. It can access the first one, then jump to one somewhere in the middle, and then access one at the end. Because there is a direct connection, this is done quickly.

    Whenever you use a computer, you are very concerned with the speed of the computer you are using. So, computer designers made a way to optimize for speed when accessing items in RAM. Whenever a processor accesses a box in RAM, it also accesses and stores the boxes near it. Often, if you are accessing one thing in RAM, it's likely that the next thing you need to access is nearby. That's why keeping a copy of nearby items in the cache speeds things up.

    Whenever the processor reads something (say, the player's position in an old adventure game) out of RAM, it adds it to the cache to use it again in the future. Then, when it needs something else from RAM, it will go to the cache for it. As you can see, the cache helps the processor by saving execution cycles required to go out and read something from RAM.

    The processor, not RAM, has the actual cache. The memory controller keeps track of what goes into and comes out of the cache.

    We can think of it in several ways. Perhaps, the processor can use the cache as a temporary area to keep a copy of its last actions just in case it needs to reread them.

    There is one caveat — it is not as if "everything" goes out to RAM and then gets inserted into the cache. In reality, the cache holds only a handful of memory addresses from RAM. Also, note that these few memory addresses in the cache can be accessed faster than other storage locations.

    hashtag
    Challenge

    Draw a model of how a processor interacts with the cache, memory controller, and RAM whenever it needs to read or write from memory.

    hashtag
    Additional Resources

    • https://en.wikipedia.org/wiki/Random-access_memory (Links to an external site.)arrow-up-right

    • https://en.wikipedia.org/wiki/Memory_controller (Links to an external site.)arrow-up-right

    • https://en.wikipedia.org/wiki/CPU_cache (Links to an external site.)arrow-up-right

    hashtag
    Objective 02 - Convert back and forth from decimal to binary

    hashtag
    Overview

    Computers use the binary number system, so we will represent all of our variables in the binary number system.

    Instead of 10 digits like 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0, the binary number system only has two possible digits, 1 and 0. Another way to think of it is that computers only have switches (bits) that can be in an "off state" or an "on state."

    hashtag
    Follow Along

    Before we try to understand the binary number system, let's review how the decimal number system works. Let's look at the number "1001" in decimal.

    Even though there are two "1" digits in this number, they don't represent the same quantity. The leftmost "1" represents one thousand, and the rightmost "1" represents one unit. The "0"s in-between represent the tens place and the hundreds place.

    So this "1001" in base ten represents "1 thousand, 0 hundreds, 0 tens, and 1 one."

    Each successive digit in the base 10 number system is a power of ten. The ones place is 10^0 = 1. The tens place is 10^1 = 10. The hundreds place is 10^2 = 100. This pattern continues on and on.

    This pattern holds for other number systems as well. In the binary system, each successive digit represents a different power of 2. The first digit represents 2^0 = 1. The second digit represents 2^1 = 2. The third digit represents 2^2 = 4. Again, this pattern continues on and on.

    So, what if the number "1001" was in binary and not decimal? What would it represent then? Well, if we read it right to left, we have a "1" in the ones place, a "0" in the twos place, a "0" in the fours place, and a "1" in the eights place. We add these values up (8 + 0 + 0 + 1) which equals 9.

    Below, is a table that shows how to count up to 8 in binary and decimal:

    Decimal

    Binary

    0

    0000

    1

    0001

    2

    0010

    3

    0011

    4

    0100

    hashtag
    Challenge

    Convert the following decimal numbers into binary numbers:

    1. 25

    2. 63

    3. 9

    4. 111

    hashtag
    Additional Resources

    • https://www.mathsisfun.com/binary-number-system.html (Links to an external site.)arrow-up-right

    • https://www.mathsisfun.com/definitions/decimal-number-system.htmlarrow-up-right

    hashtag
    Objective 03 - Understand how fixed-width integers are stored in memory

    hashtag
    Overview

    We now know that things are stored in RAM using binary, and each "box" in RAM holds 1 byte (8 bits). What does that mean for what we can store in RAM? Let's say we have 1 byte of RAM to use. How many different numbers can we represent using only this 1 byte?

    Remember that each digit in a binary number is a successive power of 2. If we have 8 bits to use, we can store 2^8 = 256 different numbers in 1 byte.

    hashtag
    Follow Along

    Let's see if we can find a pattern:

    • With one bit, we can express two numbers (0 and 1)

    • With two bits, for each of the first numbers (0 or 1), we can put a 0 or a 1 after it, so we can express four numbers

    • With three bits, we can express eight numbers.

    Every time we add a new bit, we double the number of possible numbers we can express in binary. This pattern can be generalized as 2^n and 2^8 = 256.

    Often, computers use 4 bytes (32 bits) to represent our variables, meaning that we can express as many as 4 billion (2^32) possible values. Similarly, computers may use 8 bytes (64 bits) to represent our variables and can express over 10 billion (2^64).

    The 2^X in the binary number system is called the bitsize. Eight bytes of memory are called "8-bit", and 16 bytes are called "16-bit," etc.

    In theory, you could use less space to represent smaller integers. For instance, in binary, the number one is represented by 1. So, technically, to store one in binary, you only need one bit. But computers don't usually do this. Many integers take a fixed amount of space, no matter what number they might have in them. So, even though you only need one bit to represent the number one, the computer would still use 32 or 64 bits to do so.

    So, if a variable represents a fixed-width integer, it doesn't matter if it has the value 0 or 123,456; the amount of space it takes up in RAM is the same.

    The computer can store numbers like 3, 60000000, or -14 in 32 bits, one of the "fixed-width integers" we discussed earlier. All of these fixed-width integers take up constant space (O(1)).

    Storing numbers as fixed-width integers introduces a trade-off. We have constant space complexity, and because each integer has a constant and expected number of bits, simple mathematical operations only take constant time. The cost of having an integer as fixed-width is that there is a limit to the number of integers you can represent.

    hashtag
    Challenge

    1. What is the number of possible integer values you can store with 4 bytes? How did you make that calculation?

    2. What is the number of possible integer values you can store with 8 bytes? How did you make that calculation?

    hashtag
    Additional Resources

    • https://vladris.com/blog/2018/10/13/arithmetic-overflow-and-underflow.html (Links to an external site.)arrow-up-right

    hashtag
    Objective 04 - Describe, in general terms, how arrays are stored in memory and the time complexity of lookups

    hashtag
    Overview

    When writing programs, you likely need to store several numbers, not just one integer.

    hashtag
    Follow Along

    So, let's say we wanted to write a program that allowed us to keep track of the number of hours we spent studying that day. We will round the number of hours to the nearest whole number to store them as fixed-width integers. Additionally, each day's hours will be represented by eight bits in binary.

    So, we will start at memory address 0 in RAM, and each day, store the number of hours we studied in that "box" of RAM. For our first day that we are tracking, we store an 8-bit binary integer in "box" number 0. On the second day, we store an 8-bit binary integer in "box" number 1. This pattern continues.

    Now, I'm sure you've already used an array when you are programming. An array is just an ordered sequential collection of data. Well, RAM is already structured like this. Right? Our days where we track the number of hours that we are studying are in sequential order in RAM.

    Knowing this information, what can we do if we want to look up how many hours we studied on day 5 (index 4 because of zero-indexing)? Because all of the information is stored in sequential order, we can do simple math. If you are looking for the day 5 information (index 4), you need to know what the starting item address is 0 and then add 4 (the index). Or, if the starting address was 5 and you were looking for the 10th index, you'd go to memory address 15 (5 + 10).

    This math works because we are using one "box" in memory for each day's record. If we were using 64-bit integers that take up 8 "boxes" in RAM, we would have to slightly adjust our math. In this case, we would have to multiply the index we were looking for by the number of bytes each record was stored in. So, if we were storing 64-bit integers (8 bytes) and wanted to find the item with index 4, and the starting index was 0, we would go to memory address 0 + (4 * 8) = 32.

    Because accessing information from a specific index involves this simple mathematical computation, accessing items in an array is a constant time operation. For the mathematical computations to be consistent and straightforward, arrays have to follow specific rules. Each item in the array has to take up the same number of bytes in RAM. Also, each item has to be stored right next to the previous item in RAM. If there are any gaps or interruptions in the array, then the simple mathematical computation for accessing a particular item no longer works.

    hashtag
    Challenge

    Let's say you need to store an array of 64-bit integers. Your array needs to have enough capacity for 24 integers. How many 1-byte slots of memory need to be allocated to store this array?

    hashtag
    Additional Resources

    • https://en.wikipedia.org/wiki/Array_data_typearrow-up-right

    hashtag
    Objective 05 - Describe character encoding and how strings are stored in memory

    hashtag
    Overview

    In this example, we will store some strings. A string, as we know, is just a bunch of characters or letters. One straightforward way to store a string is an array, so let's see how we can define some mappings to make it easier to store strings in arrays.

    hashtag
    Follow Along

    To use our 8-bit slots in memory, we need a way to encode each character in a string in 8-bits. One common character encoding to do this is called "ASCII". Here's how the alphabet is encoded in ASCII:

    Letter

    Encoding

    A

    01000001

    B

    01000010

    C

    01000011

    D

    01000100

    E

    01000101

    Since we can express characters as 8-bit integers, we can express strings as arrays of 8-bit characters.

    For example, we could represent LAMBDA like so:

    Each character, once it was encoded, could be stored as one 8-bit slot in memory.

    hashtag
    Challenge

    Draw out a model of a section of memory that stores the string "Computer Science" as an array of 8-bit ASCII characters.

    hashtag
    Additional Resources

    • https://www.w3schools.com/charsets/ref_html_ascii.asparrow-up-right

    D2- Module 02 - Hash Tables I

    hashtag
    Objective 01 - Recall the time and space complexity, the strengths and weaknesses, and the common uses of a hash table

    hashtag
    Overview

    Hash tables are also called hash maps, maps, unordered maps, or dictionaries. A hash table is a structure that maps keys to values. This makes them extremely efficient for lookups because if you have the key, retrieving the associated value is a constant-time operation.

    hashtag
    Follow Along

    hashtag
    Time and Space Complexity

    hashtag
    Lookup

    Hash tables have fast lookups (O(1)) on average. However, in the worst case, they have slow (O(n)) lookups. The slow lookups happen when there is a hash collision (two different keys hash to the same index).

    hashtag
    Insert

    Hash tables have fast insertions (O(1)) on average. However, in the worst case, they have slow (O(n)) insertions. Just like with the lookups, the worst case occurs due to hash collisions.

    hashtag
    Delete

    Hash tables have fast deletes (O(1)) on average. However, in the worst case, they have slow (O(n)) deletions. Just like with lookups and insertions, the worst case occurs due to hash collisions.

    hashtag
    Space

    The space complexity of a hash table is linear (O(n)). Each key-value pair in the hash table will take up space in memory.

    hashtag
    Strengths

    The main reason why hash tables are great is that they have constant-time (O(1)) lookup operations in the average case. That makes them great to use in any situation where you will be conducting many lookup operations. The second reason they are great is that they allow you to use any hashable object as a key. This means they can be used in many different scenarios where you want to map one object (the key) to another object (the value).

    hashtag
    Weaknesses

    One weakness of the hash table is that the mapping goes only one way. So, if you know the key, it's incredibly efficient to retrieve the mapped value to that key. However, if you know the value and want to find the key that is mapped to that value, it is inefficient. Another weakness is that if your hash function produces lots of collisions, the hash table's time complexity gets worse and worse. This is because the underlying linked lists are inefficient for lookups.

    hashtag
    What About Hash Collisions?

    A hash collision is when our hash function returns the same index given two different keys. Theoretically, there is no perfect hash function, though some are better than others. Thus, any hash table implementation has to have a strategy to deal with the scenario when two different keys hash to the same index. You can't just have the hash table overwrite the already existing value.

    The most common strategy for dealing with hash collisions is not storing the values directly at an index of the hash table's array. Instead, the array index stores a pointer to a linked list. Each node in the linked list stores a key, value, and a pointer to the next item in the linked list.

    The above is just one of the ways to deal with hash collisions. Hopefully, you can now see why all of our hash table operations become O(n) in the worst case. What is the worst case? The worst case is when all of the keys collide at the same index in our hash table. If we have ten items in our hash table, all ten items are stored in one linked list at the same index of our array. That means that our hash table has the same efficiency as a linked list in the worst case.

    hashtag
    Challenge

    1. In your own words, explain how and why the time complexity of hash table operations degrades to O(n) in the worst case.

    hashtag
    Additional Resources

    hashtag
    Objective 02 - Describe and implement a hash function

    hashtag
    Overview

    Hashing functions take an input (usually a string) and return an integer as the output. Let's say we needed to store five colors in our hash table. Currently, we have an empty table that looks like this:

    Now, I need to assign an index given the name of a color. Our hash function will take the name of a color and convert it into an index.

    So, hash functions convert the strings into indexes, and then we store the given string into the computed index of an array.

    Hash Function + Array = Hash Table

    Now that we know what a hash table is let's dive deeper into creating a hash function.

    hashtag
    Follow Along

    To convert a string into an integer, hashing functions operate on the individual characters that make up the string.

    Let's use what we know to create a hashing function in Python.

    In Python, we can encode strings into their bytes representation with the .encode() method (read more ). Once encoded, an integer represents each character.

    Let's do this with the string hello

    Now that we’ve converted our string into a series of integers, we can manipulate those integers somehow. For simplicity’s sake, we can use a simple accumulator pattern to get a sum of all the integer values.

    Great! We turned a string into a number. Now, let's generalize this into a function.

    We aren't done yet 🤪. As shown earlier, hello returns 532. But, what if our hash table only has ten slots? We have to make 532 a number less than 10 😱.

    Remember the modulo operator %? We can use that in our hashing function to ensure that the integer the function returns is within a specific range.

    hashtag
    Objective 03 - Implement a user-defined HashTable class that allows basic operations

    hashtag
    Overview

    We define a hash table as an empty array and hash function as a function that takes a value and converts it into an array index where you will store that value. Let's put the two together. Let's implement a HashTable class where we can:

    • Insert values into a hash table

    • Retrieve values from a hash table

    • Delete values from a hash table

    Let's start with the insert function. For an insert, I need to insert a value with an associated key. Let's store the instructors at Lambda and where they live. We want to store:

    • ("Parth", "California")

    • ("Beej", "Oregon")

    • ("Dustin", "Utah")

    Here's what our HashTable class looks like right now:

    Let's break this down a little bit. Our init function takes in the length of a hash table and creates an empty array. Our hash_index function takes a key and computes and index using the famous djb2 hash function. Let's implement the other functions (put, delete, get).

    hashtag
    Follow Along

    hashtag
    The put Method

    Let's create our put function. Before we code, let's break down what needs to happen:

    • Given a key and a value, insert the respective value into a hash table array using the hashed key to determine the storage location index.

    Let's think about what we need to do:

    • Hash the key into an index using the hash function

    • Put the value into that index

    You might be thinking, "What if two keys hash to the same index?" That's a great question, and we will worry about that later. It's a nifty solution 🎉. But for now, let's worry about hashing a key and storing a value.

    First, let's call the hash function and store the return value in index:

    Next, let's insert the value at that index:

    There we go! Given a key, we hashed it and inserted a value. Again, we will worry about colliding indices later 😀.

    hashtag
    The delete Method

    Next, let's write our delete method. What does this method need to do? We can think about it as the inverse of the put method that we just defined. The function will receive a key as its input, then pass that key through the hash function to get the index where the hash table's value needs to be deleted.

    Let's start by getting the index by passing the key through the hashing function:

    Next, we need to delete the value from that index in our storage by setting it to None. Remember, we aren't dealing with collisions in this example. If we had to deal with collisions, this would be more complex.

    hashtag
    The get Method

    The last method we need to deal with is our get method. get is a simple method that retrieves the value stored at a specific key. The function needs to receive a key as an input, pass that key through the hashing function to find the index where the value is stored, and then return the value at that index.

    Let's start by getting the index from the key:

    Next, we need to return the value that is stored at the index.

    D3-Module 03 - Hash Tables II

    Number Bases & Characters

    Original:

    hashtag

    my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
    print(my_dict)
    type(my_dict)
    new_dict=dict()
    print(new_dict)
    type(new_dict)
    new_dict=dict(Dave = '001' , Ava= '002' , Joe= '003')
    print(new_dict)
    type(new_dict)
    emp_details = {'Employee': {'Dave': {'ID': '001',
     'Salary': 2000,
     'Designation':'Python Developer'},
     'Ava': {'ID':'002',
     'Salary': 2300,
     'Designation': 'Java Developer'},
     'Joe': {'ID': '003',
     'Salary': 1843,
     'Designation': 'Hadoop Developer'}}}
    my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
    my_dict\['Dave'\]
    my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
    print(my_dict.keys())
    print(my_dict.values())
    print(my_dict.get('Dave'))
    my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
    print("All keys")
    for x in my_dict:
     print(x) #prints the keys
    print("All values")
    for x in my_dict.values():
     print(x) #prints values
    print("All keys and values")
    for x,y in my_dict.items():
     print(x, ":" , y) #prints keys and values
    my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
    my_dict\['Dave'\] = '004' #Updating the value of Dave
    my_dict\['Chris'\] = '005' #adding a key-value pair
    print(my_dict)
    my_dict={'Dave': '004', 'Ava': '002', 'Joe': '003', 'Chris': '005'}
    del my_dict\['Dave'\] #removes key-value pair of 'Dave'
    my_dict.pop('Ava') #removes the value of 'Ava'
    my_dict.popitem() #removes the last inserted item
    print(my_dict)
    import pandas as pd
    emp_details = {'Employee': {'Dave': {'ID': '001',
     'Salary': 2000,
     'Designation':'Python Developer'},
     'Ava': {'ID':'002',
     'Salary': 2300,
     'Designation': 'Java Developer'},
     'Joe': {'ID': '003',
     'Salary': 1843,
     'Designation': 'Hadoop Developer'}}}
    df=pd.DataFrame(emp_details\['Employee'\])
    print(df)
    L -> 01001100
    A -> 01000001
    M -> 01001101
    B -> 01000010
    D -> 01000100
    A -> 01000001

    ("Ryan", "Utah")

    https://www.geeksforgeeks.org/hashing-data-structure/ (Links to an external site.)arrow-up-right
    here (Links to an external site.)arrow-up-right
    https://replit.com/@bgoonz/cs-unit-1-sprint-4-module-1-hash-function#main.pyarrow-up-right
    continued....
    https://tk-assets.lambdaschool.com/add0f486-f742-4b70-9885-88c6938237f8_Untitled.png
    https://tk-assets.lambdaschool.com/16439e40-5ec9-4242-b5c5-07584bc665ca_S5-M1-O1-Hash-Table-Animation.gif
    https://s3-us-west-2.amazonaws.com/secure.notion-static.com/18e22e25-8fb5-4763-b92e-ce3ac0d3e4e4/Untitled.png
    Objective 01 - Understand hash collisions and use a linked list for collision resolution in a user-defined Hashable class

    hashtag
    Overview

    Remember when we wondered what would happen if multiple keys hashed to the same index, and we said that we would worry about it later? Whelp, it's later 🤪.

    Let's say we were given the key-value pair ("Ryan", 10). Our hash code then maps "Ryan" to index 3. Excellent, that works!Now let's say after we inserted ("Ryan", 10), we have to insert ("Parth", 12). Our hash code maps "Parth" to index 3. Uh oh! Ryan is already there! What do we do?? 😱

    Ok, let's stop freaking out, and let's think about this. If we don't do anything, the value stored at index 3 will just get overwritten. Meaning if we try to retrieve the value associated with "Ryan", 12 will be returned instead of 10. That might not seem like a big deal, but what if we were returning passwords based on a user ID, and we returned someone else's password. That would be horrible.

    Let's fix this problem. The most common way to solve this is with chaining. If we see multiple values hashed to an index, we will chain them in a some data structure that can hold multiple items. In our case, we'll use Python's list type, but a more typical solution would use a linked list. We'll cover linked lists in a future module.

    https://tk-assets.lambdaschool.com/f952600c-f3e0-4d96-bb53-def08235c9c0_collision.gif

    Ok, sounds ideal? But how does this work in code? Let's write some of it together.

    hashtag
    Follow Along

    Below is a partially filled out hash table class where we will be using HashTableEntry as our chain entries.

    Take a look at the code below.

    Let's implement the put method with collision resolution by chaining. What are the two cases we need to handle?

    1. There are no entries at the index. Great! We can initialize the entry to a list with the new HashTableEntry in it.

    2. There are multiple entries at the index. We need to check every entry in the chain. If the key in one of the entries is equal to the key we are passing in, we need to replace it. For instance, let's say we pass in ("Ryan", 12), and then we later pass in ("Ryan", 15). We would need to replace "Ryan"'s old value with 15. If there are no entries that match, we create a new entry at the end of the chain.

    Ok, that might sound confusing. Let's start breaking it down into code.

    First, we need to hash the key and start with the first entry at that index.

    Next, we need to go through the chain. We need to check two conditions:

    1. The current entry is not empty.

    2. The key or the current entry is not equal to the key we are passing in.

    Sweet! Now we need to check what happens when the loop breaks. It would only break for two reasons:

    1. We reached an entry with the same key and need to replace the value.

    2. We reached the end of the chain and need to create a new entry.

    Let's write that in code!

    Great! We created the put method.

    hashtag
    Challenge

    https://replit.com/@bgoonz/cs-unit-1-sprint-4-module-2-hash-table-collision-resolution#main.pyarrow-up-right

    https://s3-us-west-2.amazonaws.com/secure.notion-static.com/155e4481-6522-4f77-8cc1-72004e760287/Untitled.png

    hashtag
    Objective 02 - Define and compute the load factor of a hash table and implement a hash table that automatically resizes based on load factor

    hashtag
    Overview

    What does runtime look like with linked list chaining?

    The performance of hash tables for search, insertion, and deletion is constant time (O(1)) in the average case. However, as the chains get longer and longer, in the worst case, those same operations are done in linear time (O(n)). The more collisions that your hash table has, the less performant the hash table is. To avoid collisions, a proper hash function and maintaining a low load factor is crucial. What is a load factor?

    hashtag
    Load Factor

    The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.

    https://tk-assets.lambdaschool.com/59d00218-52e2-4f3d-9680-2b2d8baad3ae_S5-M3-O1LoadFactor.001.jpeg

    Hash tables use an array for storage. So, the load factor is the number of occupied slots divided by the length of the array. So, an array of length 10 with three items in it has a load factor of 0.3, and an array of length 20 with twenty items has a load factor of 1. If you use linear probing for collision resolution, then the maximum load factor is 1. If you use chaining for collision resolution, then the load factor can be greater than 1.

    As the load factor of your hash table increases, so does the likelihood of a collision, which reduces your hash table's performance. Therefore, you need to monitor the load factor and resize your hash table when the load factor gets too large. The general rule of thumb is to resize your hash table when your load factor is greater than 0.7. Also, when you resize, it is common to double the size of the hash table. When you resize the array, you need to re-insert all of the items into this new hash table. You cannot simply copy the old items into the new hash table. Each item has to be rerun through the hashing function because the hashing function considers the size of the hash table when determining the index that it returns.

    You can see that resizing is an expensive operation, so you don’t want to resize too often. However, when we average it out, hash tables are constant time (O(1)) even with resizing.

    The load factor can also be too small. If the hash table is too large for the data that it is storing, then memory is being wasted. So, in addition to resizing, when the load factor gets too high, you should also resize when the load factor gets too low.

    One way to know when to resize your hash table is to compute the load factor whenever an item is inserted or deleted into the hash table. If the load factor is too high or too low, then you need to resize.

    We added a get_load_factor and resize method to calculate the load factor and resize the hash table with a new capacity when necessary.

    hashtag
    Follow Along

    Let's change our put method to resize when the load factor gets too high. Here's how our current put method looks:

    To know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.

    Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.

    Fantastic, we did it!

    https://s3-us-west-2.amazonaws.com/secure.notion-static.com/18e22e25-8fb5-4763-b92e-ce3ac0d3e4e4/Untitled.png

    hashtag
    Objective 02 - Define and compute the load factor of a hash table and implement a hash table that automatically resizes based on load factor

    hashtag
    Overview

    What does runtime look like with linked list chaining?

    The performance of hash tables for search, insertion, and deletion is constant time (O(1)) in the average case. However, as the chains get longer and longer, in the worst case, those same operations are done in linear time (O(n)). The more collisions that your hash table has, the less performant the hash table is. To avoid collisions, a proper hash function and maintaining a low load factor is crucial. What is a load factor?

    hashtag
    Load Factor

    The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.

    https://tk-assets.lambdaschool.com/59d00218-52e2-4f3d-9680-2b2d8baad3ae_S5-M3-O1LoadFactor.001.jpeg

    Hash tables use an array for storage. So, the load factor is the number of occupied slots divided by the length of the array. So, an array of length 10 with three items in it has a load factor of 0.3, and an array of length 20 with twenty items has a load factor of 1. If you use linear probing for collision resolution, then the maximum load factor is 1. If you use chaining for collision resolution, then the load factor can be greater than 1.

    As the load factor of your hash table increases, so does the likelihood of a collision, which reduces your hash table's performance. Therefore, you need to monitor the load factor and resize your hash table when the load factor gets too large. The general rule of thumb is to resize your hash table when your load factor is greater than 0.7. Also, when you resize, it is common to double the size of the hash table. When you resize the array, you need to re-insert all of the items into this new hash table. You cannot simply copy the old items into the new hash table. Each item has to be rerun through the hashing function because the hashing function considers the size of the hash table when determining the index that it returns.

    You can see that resizing is an expensive operation, so you don’t want to resize too often. However, when we average it out, hash tables are constant time (O(1)) even with resizing.

    The load factor can also be too small. If the hash table is too large for the data that it is storing, then memory is being wasted. So, in addition to resizing, when the load factor gets too high, you should also resize when the load factor gets too low.

    One way to know when to resize your hash table is to compute the load factor whenever an item is inserted or deleted into the hash table. If the load factor is too high or too low, then you need to resize.

    We added a get_load_factor and resize method to calculate the load factor and resize the hash table with a new capacity when necessary.

    hashtag
    Follow Along

    Let's change our put method to resize when the load factor gets too high. Here's how our current put method looks:

    To know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.

    Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.

    Fantastic, we did it!

    hashtag
    Challenge

    1. Do we need to modify our delete and get methods to account for the new get_load_factor and resize methods? Why or why not?

    hashtag
    Additional Resources

    • https://courses.csail.mit.edu/6.006/spring11/rec/rec07.pdf (Links to an external site.)arrow-up-right

    hashtag
    Homework:

    hashtag
    Given a string text, you need to use the characters of text to form as many instances of the word "lambda" as possible.

    hashtag
    You can use each character in text at most once.

    hashtag
    Write a function that returns the maximum number of instances of "lambda" that can be formed.

    hashtag
    Input: text = "mbxcdatlas"

    hashtag
    Output: 1

    hashtag
    Example 2:

    hashtag
    Input: text = "lalaaxcmbdtsumbdav"

    hashtag
    Output: 2

    hashtag
    Example 3:

    hashtag
    Input: text = "sctlamb"

    hashtag
    Output: 0

    hashtag
    Notes:

    hashtag
    text consists of lowercase English characters only

    hashtag
    [execution time limit] 4 seconds (py3)

    hashtag
    [input] string text

    hashtag
    [output] integer

    5

    0101

    6

    0110

    7

    0111

    8

    1000

    F

    01000110

    G

    01000111

    H

    01001000

    I

    01001001

    J

    01001010

    K

    01001011

    L

    01001100

    M

    01001101

    N

    01001110

    O

    01001111

    P

    01010000

    Q

    01010001

    R

    01010010

    S

    01010011

    T

    01010100

    U

    01010101

    V

    01010110

    W

    01010111

    X

    01011000

    Y

    01011001

    Z

    01011010

    Hash Table Use Cases

    hashtag
    Associative arrays

    Main article:

    Hash tables are commonly used to implement many types of in-memory tables. They are used to implement (arrays whose indices are arbitrary or other complicated objects), especially in like , , and .

    bytes_representation = "hello".encode()
    
    for byte in bytes_representation:
        print(byte)
    
    ### Print Output
    ### 104
    ### 101
    ### 108
    ### 108
    ### 111
    bytes_representation = "hello".encode()
    
    sum = 0
    for byte in bytes_representation:
        sum += byte
    
    print(sum)
    
    ### Print Output
    ### 532
    def my_hashing_func(str):
        bytes_representation = str.encode()
    
        sum = 0
        for byte in bytes_representation:
            sum += byte
    
        return sum
    def my_hashing_func(str, table_size):
        bytes_representation = str.encode()
    
        sum = 0
        for byte in bytes_representation:
            sum += byte
    
        return sum % table_size
    class HashTable:
        """
        A hash table with `capacity` buckets
        that accepts string keys
        """
    
        def __init__(self, capacity):
            self.capacity = capacity  # Number of buckets in the hash table
            self.storage = [None] * capacity
            self.item_count = 0
    
        def get_num_slots(self):
            """
            Return the length of the list you're using to hold the hash table data. (Not the number of items stored in the hash table,
            but the number of slots in the main list.)
            One of the tests relies on this.
            """
            return len(self.storage)
    
        def djb2(self, key):
            """
            DJB2 hash, 32-bit
            """
            # Cast the key to a string and get bytes
            str_key = str(key).encode()
    
            # Start from an arbitrary large prime
            hash_value = 5381
    
            # Bit-shift and sum value for each character
            for b in str_key:
                hash_value = ((hash_value << 5) + hash_value) + b
                hash_value &= 0xffffffff  # DJB2 is a 32-bit hash, only keep 32 bits
    
            return hash_value
    
        def hash_index(self, key):
            """
            Take an arbitrary key and return a valid integer index within the hash table's storage capacity.
            """
            return self.djb2(key) % self.capacity
    
        def put(self, key, value):
            """
            Store the value with the given key.
            """
    
        def delete(self, key):
            """
            Remove the value stored with the given key.
            Print a warning if the key is not found.
            """
    
        def get(self, key):
            """
            Retrieve the value stored with the given key.
            Returns None if the key is not found.
            """
    def put(self, key, value):
        """
        Store the value with the given key.
        """
        index = self.hash_index(key)
    def put(self, key, value):
        """
        Store the value with the given key.
        """
        index = self.hash_index(key)
        self.storage[index] = value
        return
    def delete(self, key):
        """
        Remove the value stored with the given key.
        """
        index = self.hash_index(key)
    def delete(self, key):
        """
        Remove the value stored with the given key.
        """
        index = self.hash_index(key)
        self.storage[index] = None
    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        """
        index = self.hash_index(key)
    def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        """
        index = self.hash_index(key)
        return self.storage[index]
    class HashTableEntry:
        """
        Hash table key/value pair to go in our collision chain
        """
        def __init__(self, key, value):
            self.key = key
            self.value = value
    
    # Hash table can't have fewer than this many slots
    MIN_CAPACITY = 8
    
    class HashTable:
        """
        A hash table with `capacity` buckets
        that accepts string keys
        Implement this.
        """
    
        def __init__(self, capacity):
            self.capacity = capacity  # Number of buckets in the hash table
    
            self.storage = []
            for _ in range(capacity):   # Initialize with empty lists
                self.storage.append([])
    
            self.item_count = 0
    
        def get_num_slots(self):
            """
            Return the length of the list you're using to hold the hash table data. (Not the number of items stored in the hash table,
            but the number of slots in the main list.)
            One of the tests relies on this.
            Implement this.
            """
            # Your code here
    
        def get_load_factor(self):
            """
            Return the load factor for this hash table.
            Implement this.
            """
            return len(self.storage)
    
        def djb2(self, key):
            """
            DJB2 hash, 32-bit
            Implement this, and/or FNV-1.
            """
            str_key = str(key).encode()
    
            hash = FNV_offset_basis_64
    
            for b in str_key:
                hash *= FNV_prime_64
                hash ^= b
                hash &= 0xffffffffffffffff  # 64-bit hash
    
            return hash
    
        def hash_index(self, key):
            """
            Take an arbitrary key and return a valid integer index between within the hash table's storage capacity.
            """
            return self.djb2(key) % self.capacity
    
        def put(self, key, value):
            """
            Store the value with the given key.
            Hash collisions should be handled with Linked List Chaining.
            Implement this.
            """
            # Your code here
    
        def delete(self, key):
            """
            Remove the value stored with the given key.
            Print a warning if the key is not found.
            Implement this.
            """
            # Your code here
    
        def get(self, key):
            """
            Retrieve the value stored with the given key.
            Returns None if the key is not found.
            Implement this.
            """
            # Your code here
        def put(self, key, value):
            """
            Store the value with the given key.
            Hash collisions should be handled with Linked List Chaining.
            Implement this.
            """
            # Your code here
        def put(self, key, value):
            """
            Store the value with the given key.
            Hash collisions should be handled with Linked List Chaining.
            Implement this.
            """
            index = self.hash_index(key)
    
            chain = self.storage[index]
        def put(self, key, value):
            """
            Store the value with the given key.
            Hash collisions should be handled with Linked List Chaining.
            Implement this.
            """
            index = self.hash_index(key)
    
            chain = self.storage[index]
    
            existing_entry = None
    
            for current_entry in chain:
                if current_entry.key == key:
                    exiting_entry = current_entry
                    break
        def put(self, key, value):
            """
            Store the value with the given key.
            Hash collisions should be handled with Linked List Chaining.
            Implement this.
            """
            index = self.hash_index(key)
    
            chain = self.storage[index]
    
            existing_entry = None
    
            for current_entry in chain:
                if current_entry.key == key:
                    existing_entry = current_entry
                    break
    
            if existing_entry is not None:
                existing_entry.value = value
            else:
                new_entry = HashTableEntry(key, value)
                chain.append(new_entry)
    class HashTableEntry:
        """
        Linked List hash table key/value pair
        """
        def __init__(self, key, value):
            self.key = key
            self.value = value
            self.next = None
    
    # Hash table can't have fewer than this many slots
    MIN_CAPACITY = 8
    
    class HashTable:
        """
        A hash table with `capacity` buckets
        that accepts string keys
        Implement this.
        """
    
        def __init__(self, capacity):
            self.capacity = capacity  # Number of buckets in the hash table
            self.storage = [None] * capacity
            self.item_count = 0
    
        def get_num_slots(self):
            """
            Return the length of the list you're using to hold the hash
            table data. (Not the number of items stored in the hash table,
            but the number of slots in the main list.)
            One of the tests relies on this.
            Implement this.
            """
            # Your code here
    
        def get_load_factor(self):
            """
            Return the load factor for this hash table.
            Implement this.
            """
            return self.item_count / self.capacity
    
        def resize(self, new_capacity):
            """
            Changes the capacity of the hash table and
            rehashes all key/value pairs.
            Implement this.
            """
            old_storage = self.storage
            self.capacity = new_capacity
            self.storage = [None] * self.capacity
    
            current_entry = None
    
            # Save this because put adds to it, and we don't want that.
            # It might be less hackish to pass a flag to put indicating that
            # we're in a resize and don't want to modify item count.
            old_item_count = self.item_count
    
            for bucket_item in old_storage:
                current_entry = bucket_item
                while current_entry is not None:
                    self.put(current_entry.key, current_entry.value)
                    current_entry = current_entry.next
    
            # Restore this to the correct number
            self.item_count = old_item_count
    
        def djb2(self, key):
            """
            DJB2 hash, 32-bit
            Implement this, and/or FNV-1.
            """
            str_key = str(key).encode()
    
            hash = FNV_offset_basis_64
    
            for b in str_key:
                hash *= FNV_prime_64
                hash ^= b
                hash &= 0xffffffffffffffff  # 64-bit hash
    
            return hash
    
        def hash_index(self, key):
            """
            Take an arbitrary key and return a valid integer index
            within the hash table's storage capacity.
            """
            return self.djb2(key) % self.capacity
    
        def put(self, key, value):
            """
            Store the value with the given key.
            Hash collisions should be handled with Linked List Chaining.
            Implement this.
            """
            index = self.hash_index(key)
    
            current_entry = self.storage[index]
    
            while current_entry is not None and current_entry.key != key:
                current_entry = current_entry.next
    
            if current_entry is not None:
                current_entry.value = value
            else:
                new_entry = HashTableEntry(key, value)
                new_entry.next = self.storage[index]
                self.storage[index] = new_entry
    
        def delete(self, key):
            """
            Remove the value stored with the given key.
            Print a warning if the key is not found.
            Implement this.
            """
            # Your code here
    
        def get(self, key):
            """
            Retrieve the value stored with the given key.
            Returns None if the key is not found.
            Implement this.
            """
            # Your code here
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
    
        current_entry = self.storage[index]
    
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
    
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
    
        current_entry = self.storage[index]
    
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
    
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
    
            self.item_count += 1
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
    
        current_entry = self.storage[index]
    
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
    
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
    
            self.item_count += 1
    
            if self.get_load_factor() > 0.7:
                self.resize(self.capacity * 2)
    class HashTableEntry:
        """
        Linked List hash table key/value pair
        """
        def __init__(self, key, value):
            self.key = key
            self.value = value
            self.next = None
    
    # Hash table can't have fewer than this many slots
    MIN_CAPACITY = 8
    
    class HashTable:
        """
        A hash table with `capacity` buckets
        that accepts string keys
        Implement this.
        """
    
        def __init__(self, capacity):
            self.capacity = capacity  # Number of buckets in the hash table
            self.storage = [None] * capacity
            self.item_count = 0
    
        def get_num_slots(self):
            """
            Return the length of the list you're using to hold the hash
            table data. (Not the number of items stored in the hash table,
            but the number of slots in the main list.)
            One of the tests relies on this.
            Implement this.
            """
            # Your code here
    
        def get_load_factor(self):
            """
            Return the load factor for this hash table.
            Implement this.
            """
            return self.item_count / self.capacity
    
        def resize(self, new_capacity):
            """
            Changes the capacity of the hash table and
            rehashes all key/value pairs.
            Implement this.
            """
            old_storage = self.storage
            self.capacity = new_capacity
            self.storage = [None] * self.capacity
    
            current_entry = None
    
            # Save this because put adds to it, and we don't want that.
            # It might be less hackish to pass a flag to put indicating that
            # we're in a resize and don't want to modify item count.
            old_item_count = self.item_count
    
            for bucket_item in old_storage:
                current_entry = bucket_item
                while current_entry is not None:
                    self.put(current_entry.key, current_entry.value)
                    current_entry = current_entry.next
    
            # Restore this to the correct number
            self.item_count = old_item_count
    
        def djb2(self, key):
            """
            DJB2 hash, 32-bit
            Implement this, and/or FNV-1.
            """
            str_key = str(key).encode()
    
            hash = FNV_offset_basis_64
    
            for b in str_key:
                hash *= FNV_prime_64
                hash ^= b
                hash &= 0xffffffffffffffff  # 64-bit hash
    
            return hash
    
        def hash_index(self, key):
            """
            Take an arbitrary key and return a valid integer index
            within the hash table's storage capacity.
            """
            return self.djb2(key) % self.capacity
    
        def put(self, key, value):
            """
            Store the value with the given key.
            Hash collisions should be handled with Linked List Chaining.
            Implement this.
            """
            index = self.hash_index(key)
    
            current_entry = self.storage[index]
    
            while current_entry is not None and current_entry.key != key:
                current_entry = current_entry.next
    
            if current_entry is not None:
                current_entry.value = value
            else:
                new_entry = HashTableEntry(key, value)
                new_entry.next = self.storage[index]
                self.storage[index] = new_entry
    
        def delete(self, key):
            """
            Remove the value stored with the given key.
            Print a warning if the key is not found.
            Implement this.
            """
            # Your code here
    
        def get(self, key):
            """
            Retrieve the value stored with the given key.
            Returns None if the key is not found.
            Implement this.
            """
            # Your code here
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
    
        current_entry = self.storage[index]
    
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
    
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
    
        current_entry = self.storage[index]
    
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
    
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
    
            self.item_count += 1
    def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
    
        current_entry = self.storage[index]
    
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
    
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
    
            self.item_count += 1
    
            if self.get_load_factor() > 0.7:
                self.resize(self.capacity * 2)
    
    def csMaxNumberOfLambdas(text):
        sub_string = "lambda"
        lambda_count = {"l": 0, "a": 0, "m": 0, "b": 0, "d": 0, "a": 0}
        counts = []
        for letter in text:
            if letter in lambda_count:
                lambda_count[letter] += 1
        for key, value in lambda_count.items():
            counts.append(value)
        return min(counts)
    
    When storing a new item into a and a hash collision occurs, the multimap unconditionally stores both items.

    When storing a new item into a typical associative array and a hash collision occurs, but the actual keys themselves are different, the associative array likewise stores both items. However, if the key of the new item exactly matches the key of an old item, the associative array typically erases the old item and overwrites it with the new item, so every item in the table has a unique key.

    hashtag
    Database indexing

    Hash tables may also be used as -based data structures and (such as in ) although are more popular in these applications. In multi-node database systems, hash tables are commonly used to distribute rows amongst nodes, reducing network traffic for .

    hashtag
    Caches[]

    Main article:

    Hash tables can be used to implement , auxiliary data tables that are used to speed up the access to data that is primarily stored in slower media. In this application, hash collisions can be handled by discarding one of the two colliding entries—usually erasing the old item that is currently stored in the table and overwriting it with the new item, so every item in the table has a unique hash value.

    hashtag
    Sets[]

    Besides recovering the entry that has a given key, many hash table implementations can also tell whether such an entry exists or not.

    Those structures can therefore be used to implement a , which merely records whether a given key belongs to a specified set of keys. In this case, the structure can be simplified by eliminating all parts that have to do with the entry values. Hashing can be used to implement both static and dynamic sets.

    hashtag
    Object representation

    Several dynamic languages, such as , , , , and , use hash tables to implement . In this representation, the keys are the names of the members and methods of the object, and the values are pointers to the corresponding member or method.

    hashtag
    Unique data representation

    Main article:

    Hash tables can be used by some programs to avoid creating multiple character strings with the same contents. For that purpose, all strings in use by the program are stored in a single string pool implemented as a hash table, which is checked whenever a new string has to be created. This technique was introduced in interpreters under the name , and can be used with many other kinds of data ( in a system, records in a database, files in a file system, binary decision diagrams, etc.).

    hashtag
    Transposition table

    Main article:

    A to a complex Hash Table which stores information about each section that has been searched.

    Associative arrayarrow-up-right
    associative arraysarrow-up-right
    stringsarrow-up-right
    interpretedarrow-up-right
    programming languagesarrow-up-right
    Rubyarrow-up-right
    Pythonarrow-up-right
    PHParrow-up-right
    multimaparrow-up-right
    diskarrow-up-right
    database indicesarrow-up-right
    dbmarrow-up-right
    B-treesarrow-up-right
    hash joinsarrow-up-right
    editarrow-up-right
    Cache (computing)arrow-up-right
    cachesarrow-up-right
    editarrow-up-right
    set data structurearrow-up-right
    [43]arrow-up-right
    Perlarrow-up-right
    Pythonarrow-up-right
    JavaScriptarrow-up-right
    Luaarrow-up-right
    Rubyarrow-up-right
    objectsarrow-up-right
    String interningarrow-up-right
    Lisparrow-up-right
    hash consingarrow-up-right
    expression treesarrow-up-right
    symbolic algebraarrow-up-right
    Transposition tablearrow-up-right
    transposition tablearrow-up-right
    [44]arrow-up-right

    D4- Module 04 - Searching and Recursion

    Searching & Recursion

    Recursion Explainedchevron-rightRecursion Exampleschevron-right

    hashtag
    Objective 01 - Understand logarithms and recall the common cases where they come up in technical interviews

    hashtag
    Overview

    hashtag
    What is a logarithm?

    Logarithms are a way of looking differently at exponentials. I know that this is a bit of a vague definition, so let's look at an example:

    What does the mathematical expression above mean? It's an abbreviation for the following expression:

    What we are looking at above is two different ways to express an object that doubles in size with each iteration.

    Another way to think about 2^5 = 32 is that 2 is the "growth rate" and 5 is the number of times you went through the growth (doubling). 32 is the final result.

    Let's look at a few more expressions:

    Now, to begin looking at logarithms, let's rewrite the exponential expressions above in logarithmic form.

    Notice how we have essentially just moved around different pieces of the expression.

    For our first expression,

    2 was the "growth rate", 5 was the "time" spent growing, and 32 was where we ended up. When we rewrite this logarithmically, we have

    In this case, 2 still represents the "growth rate" and 32 still represents where we end up. The 5 also still represents the "time" spent growing.

    So, the difference between when we would use a logarithm and when we use exponentiation depends on what factors we know ahead of time. If you know the growth rate and you know how long you are growing, you can use exponentiation (2^5) to figure out where you end up (32). However, if you know the growth rate and where you end up but do not know the time spent growing, you can use a logarithm (log_2 32) to figure that out.

    Logarithms have an inverse relationship with exponents, just like division and multiplication have an inverse relationship.

    For example, if you know that you have one group of 5 items and you want to identify the total you would have if you had 4 of those groups instead of just one, you could express that with 5 * 4 = 20. However, if you knew that you had a total of 20 items and you wanted to know how many groups of 5 you could make out your total, you could express that with 20 \ 5 = 4.

    hashtag
    Follow Along

    hashtag
    Why should I care? What does this have to do with programming and interview preparation?

    In computer science, you often ask questions like "How many times must n be divided in half before we get to one?" or "How many times will we halve this collection before the collection has only one item?" To answer these questions, you can use logarithms! Halving is like doubling, so we can say that log_2 n would give us the answer we're seeking.

    You will see this come up when analyzing the time complexity of specific algorithms. Any algorithm that doubles or halves a number or collection on each iteration of a loop is likely to have O(log n) time complexity. You will see this come up specifically when we talk about binary search and its time complexity. You will also see this come up in specific sorting algorithms (like merge sort). In simple terms, merge sort divides a collection in half and then merges the sorted halves. The fact that the algorithm repeatedly halves something is your clue that it includes a logarithm in its time complexity. One last place you're likely to see logarithms come up is with a perfect binary tree. One property of these binary trees is that the number of nodes doubles at each level.

    hashtag
    Challenge

    1. Write a logarithmic expression that is identical to this exponential expression:

    2. Write an exponential expression that is identical to this logarithmic expression:

    hashtag
    Additional Resources

    hashtag
    Objective 02 - Write a linear search algorithm

    hashtag
    Overview

    Imagine that I've chosen a random number from 1 to 20. Then, you must guess the number. One approach would be to start picking at 1 and increment your guess by 1 with each guess. If the computer randomly selected 20, then it would take you 20 guesses to get the correct answer. If the computer guessed 1, then you would have the right answer on your very first guess. On average, you will get the correct answer on the 10th or 11th guess.

    If the collection we are searching through is random and unsorted, linear search is the most efficient way to search through it. Once we have a sorted list, then there are more efficient approaches to use.

    hashtag
    Follow Along

    We want to write a simple program to conduct a linear search on a collection of data. Let's write a function that takes a list (arr) and an integer (target) as its input and returns the integer idx where the target is found. If the target does not exist in the arr, then the function should return -1.

    hashtag
    Challenge

    hashtag
    What Is Recursion?

    The word recursion comes from the Latin word recurrere, meaning to run or hasten back, return, revert, or recur. Here are some online definitions of recursion:

    • The act or process of returning or running back

    • The act of defining an object (usually a function) in terms of that object itself

    • A method of defining a sequence of objects, such as an expression, function, or set, where some number of initial objects are given and each successive object is defined in terms of the preceding objects

    A recursive definition is one in which the defined term appears in the definition itself. Self-referential situations often crop up in real life, even if they aren’t immediately recognizable as such. For example, suppose you wanted to describe the set of people that make up your ancestors. You could describe them this way:

    Notice how the concept that is being defined, ancestors, shows up in its own definition. This is a recursive definition.

    In programming, recursion has a very precise meaning. It refers to a coding technique in which a function calls itself.

    hashtag
    Why Use Recursion?

    Most programming problems are solvable without recursion. So, strictly speaking, recursion usually isn’t necessary.

    However, some situations particularly lend themselves to a self-referential definition—for example, the definition of ancestors shown above. If you were devising an algorithm to handle such a case programmatically, a recursive solution would likely be cleaner and more concise.

    Traversal of is another good example. Because these are nested structures, they readily fit a recursive definition. A non-recursive algorithm to walk through a nested structure is likely to be somewhat clunky, while a recursive solution will be relatively elegant. An example of this appears later in this tutorial.

    On the other hand, recursion isn’t for every situation. Here are some other factors to consider:

    • For some problems, a recursive solution, though possible, will be awkward rather than elegant.

    • Recursive implementations often consume more memory than non-recursive ones.

    • In some cases, using recursion may result in slower execution time.

    Typically, the readability of the code will be the biggest determining factor. But it depends on the circumstances. The examples presented below should help you get a feel for when you should choose recursion.

    hashtag
    Recursion in Python

    When you call a function in Python, the interpreter creates a new so that names defined within that function don’t with identical names defined elsewhere. One function can call another, and even if they both define objects with the same name, it all works out fine because those objects exist in separate namespaces.

    The same holds true if multiple instances of the same function are running concurrently. For example, consider the following definition:

    When function() executes the first time, Python creates a namespace and assigns x the value 10 in that namespace. Then function() calls itself recursively. The second time function() runs, the interpreter creates a second namespace and assigns 10 to x there as well. These two instances of the name x are distinct from each another and can coexist without clashing because they are in separate namespaces.

    Unfortunately, running function() as it stands produces a result that is less than inspiring, as the following shows:>>>

    As written, function() would in theory go on forever, calling itself over and over without any of the calls ever returning. In practice, of course, nothing is truly forever. Your computer only has so much memory, and it would run out eventually.

    Python doesn’t allow that to happen. The interpreter limits the maximum number of times a function can call itself recursively, and when it reaches that limit, it raises a RecursionError , as you see above.

    Technical note: You can find out what Python’s recursion limit is with a function from the sys module called getrecursionlimit():>>>

    You can change it, too, with setrecursionlimit():>>>

    You can set it to be pretty large, but you can’t make it infinite.

    There isn’t much use for a function to indiscriminately call itself recursively without end. It’s reminiscent of the instructions that you sometimes find on shampoo bottles: “Lather, rinse, repeat.” If you were to follow these instructions literally, you’d shampoo your hair forever!

    This logical flaw has evidently occurred to some shampoo manufacturers, because some shampoo bottles instead say “Lather, rinse, repeat as necessary.” That provides a termination condition to the instructions. Presumably, you’ll eventually feel your hair is sufficiently clean to consider additional repetitions unnecessary. Shampooing can then stop.

    Similarly, a function that calls itself recursively must have a plan to eventually stop. Recursive functions typically follow this pattern:

    • There are one or more base cases that are directly solvable without the need for further recursion.

    • Each recursive call moves the solution progressively closer to a base case.

    You’re now ready to see how this works with some examples.

    hashtag
    Get Started: Count Down to Zero

    The first example is a function called countdown(), which takes a positive number as an argument and prints the numbers from the specified argument down to zero:>>>

    Notice how countdown() fits the paradigm for a recursive algorithm described above:

    • The base case occurs when n is zero, at which point recursion stops.

    • In the recursive call, the argument is one less than the current value of n, so each recursion moves closer to the base case.

    Note: For simplicity, countdown() doesn’t check its argument for validity. If n is either a non-integer or negative, you’ll get a RecursionError exception because the base case is never reached.

    The version of countdown() shown above clearly highlights the base case and the recursive call, but there’s a more concise way to express it:

    Here’s one possible non-recursive implementation for comparison:>>>

    This is a case where the non-recursive solution is at least as clear and intuitive as the recursive one, and probably more so.

    hashtag
    Calculate Factorial

    The next example involves the mathematical concept of . The factorial of a positive integer n, denoted as n!, is defined as follows:

    In other words, n! is the product of all integers from 1 to n, inclusive.

    Factorial so lends itself to recursive definition that programming texts nearly always include it as one of the first examples. You can express the definition of n! recursively like this:

    As with the example shown above, there are base cases that are solvable without recursion. The more complicated cases are reductive, meaning that they reduce to one of the base cases:

    • The base cases (n = 0 or n = 1) are solvable without recursion.

    • For values of n greater than 1, n! is defined in terms of (n - 1)!, so the recursive solution progressively approaches the base case.

    For example, recursive computation of 4! looks like this:Recursive Calculation of 4!

    The calculations of 4!, 3!, and 2! suspend until the algorithm reaches the base case where n = 1. At that point, 1! is computable without further recursion, and the deferred calculations run to completion.

    hashtag
    Define a Python Factorial Function

    Here’s a recursive Python function to calculate factorial. Note how concise it is and how well it mirrors the definition shown above:>>>

    A little embellishment of this function with some statements gives a clearer idea of the call and return sequence:>>>

    Notice how all the recursive calls stack up. The function gets called with n = 4, 3, 2, and 1 in succession before any of the calls return. Finally, when n is 1, the problem can be solved without any more recursion. Then each of the stacked-up recursive calls unwinds back out, returning 1, 2, 6, and finally 24 from the outermost call.

    Recursion isn’t necessary here. You could implement factorial() iteratively using a loop:>>>

    You can also implement factorial using Python’s , which you can from the functools module:>>>

    Again, this shows that if a problem is solvable with recursion, there will also likely be several viable non-recursive solutions as well. You’ll typically choose based on which one results in the most readable and intuitive code.

    Another factor to take into consideration is execution speed. There can be significant performance differences between recursive and non-recursive solutions. In the next section, you’ll explore these differences a little further.

    hashtag
    Speed Comparison of Factorial Implementations

    To evaluate execution time, you can use a function called from a module that is also called timeit. This function supports a number of different formats, but you’ll use the following format in this tutorial:

    timeit() first executes the commands contained in the specified <setup_string>. Then it executes <command> the given number of <iterations> and reports the cumulative execution time in seconds:>>>

    Here, the setup parameter assigns string the value 'foobar'. Then timeit() prints string one hundred times. The total execution time is just over 3/100 of a second.

    The examples shown below use timeit() to compare the recursive, iterative, and reduce() implementations of factorial from above. In each case, setup_string contains a setup string that defines the relevant factorial() function. timeit() then executes factorial(4) a total of ten million times and reports the aggregate execution.

    First, here’s the recursive version:>>>

    Next up is the iterative implementation:>>>

    Last, here’s the version that uses reduce():>>>

    In this case, the iterative implementation is the fastest, although the recursive solution isn’t far behind. The method using reduce() is the slowest. Your mileage will probably vary if you try these examples on your own machine. You certainly won’t get the same times, and you may not even get the same ranking.

    Does it matter? There’s a difference of almost four seconds in execution time between the iterative implementation and the one that uses reduce(), but it took ten million calls to see it.

    If you’ll be calling a function many times, you might need to take execution speed into account when choosing an implementation. On the other hand, if the function will run relatively infrequently, then the difference in execution times will probably be negligible. In that case, you’d be better off choosing the implementation that seems to express the solution to the problem most clearly.

    For factorial, the timings recorded above suggest a recursive implementation is a reasonable choice.

    Frankly, if you’re coding in Python, you don’t need to implement a factorial function at all. It’s already available in the standard :>>>

    Perhaps it might interest you to know how this performs in the timing test:>>>

    Wow! math.factorial() performs better than the best of the other three implementations shown above by roughly a factor of 10.

    Technical note: The fact that math.factorial() is so much speedier probably has nothing to do with whether it’s implemented recursively. More likely it’s because the function is implemented in rather than Python. For more reading on Python and C, see these resources:

    A function implemented in C will virtually always be faster than a corresponding function implemented in pure Python.

    hashtag
    Traverse a Nested List

    The next example involves visiting each item in a nested list structure. Consider the following Python :

    As the following diagram shows, names contains two sublists. The first of these sublists itself contains another sublist:

    Suppose you wanted to count the number of leaf elements in this list—the lowest-level str objects—as though you’d flattened out the list. The leaf elements are "Adam", "Bob", "Chet", "Cat", "Barb", "Bert", "Alex", "Bea", "Bill", and "Ann", so the answer should be 10.

    Just calling len() on the list doesn’t give the correct answer:>>>

    len() counts the objects at the top level of names, which are the three leaf elements "Adam", "Alex", and "Ann" and two sublists ["Bob", ["Chet", "Cat"], "Barb", "Bert"] and ["Bea", "Bill"]:>>>

    What you need here is a function that traverses the entire list structure, sublists included. The algorithm goes something like this:

    1. Walk through the list, examining each item in turn.

    2. If you find a leaf element, then add it to the accumulated count.

    3. If you encounter a sublist, then do the following:

    Note the self-referential nature of this description: Walk through the list. If you encounter a sublist, then similarly walk through that list. This situation begs for recursion!

    hashtag
    Traverse a Nested List Recursively

    Recursion fits this problem very nicely. To solve it, you need to be able to determine whether a given list item is leaf item or not. For that, you can use the built-in Python function .

    In the case of the names list, if an item is an instance of type list, then it’s a sublist. Otherwise, it’s a leaf item:>>>

    Now you have the tools in place to implement a function that counts leaf elements in a list, accounting for sublists recursively:

    If you run count_leaf_items() on several lists, including the names list defined above, you get this:>>>

    As with the factorial example, adding some statements helps to demonstrate the sequence of recursive calls and values:

    hashtag
    Homework

    hashtag
    Binary Search

    hashtag
    Which logarithmic expression is identical to the following exponential expression?

    2^n = 64 log_2(64)=n

    hashtag
    item_list must be sorted from least to greatest.

    For a given positive integer n determine if it can be represented as a sum of two (possibly equal).

    Example

    • For n = 1, the output should be fibonacciSimpleSum2(n) = true.

      Explanation: 1 = 0 + 1 = F~0~ + F~1~.

    • For n = 11, the output should be fibonacciSimpleSum2(n) = true

    Input/Output

    • [execution time limit] 4 seconds (py3)

    • [input] integer n

      Guaranteed constraints: 1 ≤ n ≤ 2 - 10^9^.

    Given an integer array nums sorted in ascending order, and an integer target.

    Suppose that nums is rotated at some pivot unknown to you beforehand (i.e., [1,2,3,4,5,6,7] might become [4,5,6,7,1,2,3]).

    You should search for target in nums and if found return its index, otherwise return -1.

    Example 1: Input: nums = [6,7,1,2,3,4,5], target = 1 Output: 2

    Example 2: Input: nums = [6,7,1,2,3,4,5], target = 3 Output: 4

    Example 3: Input: nums = [1], target = 2 Output: -1

    Your solution should have better than O(n) time complexity over the number of items in the list. There is an O(log n) solution. There is also an O(1) solution.

    Note:

    1. 1 <= nums.length < 100

    2. 1 <= nums[i] <= 100

    3. All values of nums are unique.

    What keywords should you look out for that might alert you that logarithms are involved?

  • Drop down into that sublist and similarly walk through it.

  • Once you’ve exhausted the sublist, go back up, add the elements from the sublist to the accumulated count, and resume the walk through the parent list where you left off.

  • .

    Explanation: 11 = 3 + 8 = F~4~ + F~6~.

  • For n = 60, the output should be fibonacciSimpleSum2(n) = true.

    Explanation: 60 = 5 + 55 = F~5~ + F~10~.

  • For n = 66, the output should be fibonacciSimpleSum2(n) = false.

  • [output] boolean

    true if n can be represented as F~i~ + F~j~, false otherwise.

    Numbers from 1 up to the length of the list will be contained in the list.

  • [execution time limit] 4 seconds (py3)

  • [input] array.integer nums

  • [input] integer target

  • [output] integer

  • https://www.mathsisfun.com/algebra/logarithms.html (Links to an external site.)arrow-up-right
    https://www.interviewcake.com/article/python3/logarithmsarrow-up-right
    Dictionary.com:arrow-up-right
    Wiktionary:arrow-up-right
    The Free Dictionary:arrow-up-right
    arrow-up-right
    tree-like data structuresarrow-up-right
    local namespacearrow-up-right
    collidearrow-up-right
    tracebackarrow-up-right
    exceptionarrow-up-right
    factorialarrow-up-right
    arrow-up-right
    arrow-up-right
    arrow-up-right
    print()arrow-up-right
    forarrow-up-right
    reduce()arrow-up-right
    importarrow-up-right
    timeit()arrow-up-right
    math modulearrow-up-right
    Carrow-up-right
    Python Bindings: Calling C or C++ From Pythonarrow-up-right
    Building a Python C Extension Modulearrow-up-right
    C for Python Programmersarrow-up-right
    listarrow-up-right
    arrow-up-right
    isinstance()arrow-up-right
    print()arrow-up-right
    returnarrow-up-right
    Fibonacci numbersarrow-up-right
    2^5
    2 * 2 * 2 * 2 * 2
    2^5 = 32
    2^0 = 1
    2^{-1} = \frac{1}{2}
    log_2 32 = 5
    log_2 1 = 0
    log_2 \frac{1}{2} = 1
    2^5 = 32
    log_2 32 = 5
    The list must be ordered from least to greatest
    def linear_search(arr, target):
        # loop through each item in the input array
        for idx in range(len(arr)):
            # check if the item at the current index is equal to the target
            if arr[idx] == target:
                # return the current index as the match
                return idx
        # if we were able to loop through the entire array, the target is not present
        return -1
    def function():
        x = 10
        function()
    >>> function()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 3, in function
      File "<stdin>", line 3, in function
      File "<stdin>", line 3, in function
      [Previous line repeated 996 more times]
    RecursionError: maximum recursion depth exceeded
    >>> from sys import getrecursionlimit
    >>> getrecursionlimit()
    1000
    >>> from sys import setrecursionlimit
    >>> setrecursionlimit(2000)
    >>> getrecursionlimit()
    2000
    >>> def countdown(n):
    ...     print(n)
    ...     if n == 0:
    ...         return             # Terminate recursion
    ...     else:
    ...         countdown(n - 1)   # Recursive call
    ...
    
    >>> countdown(5)
    5
    4
    3
    2
    1
    0
    def countdown(n):
        print(n)
        if n > 0:
            countdown(n - 1)
    >>> def countdown(n):
    ...     while n >= 0:
    ...         print(n)
    ...         n -= 1
    ...
    
    >>> countdown(5)
    5
    4
    3
    2
    1
    0
    >>> def factorial(n):
    ...     return 1 if n <= 1 else n * factorial(n - 1)
    ...
    
    >>> factorial(4)
    24
    >>> def factorial(n):
    ...     print(f"factorial() called with n = {n}")
    ...     return_value = 1 if n <= 1 else n * factorial(n -1)
    ...     print(f"-> factorial({n}) returns {return_value}")
    ...     return return_value
    ...
    
    >>> factorial(4)
    factorial() called with n = 4
    factorial() called with n = 3
    factorial() called with n = 2
    factorial() called with n = 1
    -> factorial(1) returns 1
    -> factorial(2) returns 2
    -> factorial(3) returns 6
    -> factorial(4) returns 24
    24
    >>> def factorial(n):
    ...     return_value = 1
    ...     for i in range(2, n + 1):
    ...         return_value *= i
    ...     return return_value
    ...
    
    >>> factorial(4)
    24
    >>> from functools import reduce
    >>> def factorial(n):
    ...     return reduce(lambda x, y: x * y, range(1, n + 1) or [1])
    ...
    
    >>> factorial(4)
    24
    timeit(<command>, setup=<setup_string>, number=<iterations>)
    >>> from timeit import timeit
    
    >>> timeit("print(string)", setup="string='foobar'", number=100)
    foobar
    foobar
    foobar
       .
       . [100 repetitions]
       .
    foobar
    0.03347089999988384
    >>> setup_string = """
    ... print("Recursive:")
    ... def factorial(n):
    ...     return 1 if n <= 1 else n * factorial(n - 1)
    ... """
    
    >>> from timeit import timeit
    >>> timeit("factorial(4)", setup=setup_string, number=10000000)
    Recursive:
    4.957105500000125
    >>> setup_string = """
    ... print("Iterative:")
    ... def factorial(n):
    ...     return_value = 1
    ...     for i in range(2, n + 1):
    ...         return_value *= i
    ...     return return_value
    ... """
    
    >>> from timeit import timeit
    >>> timeit("factorial(4)", setup=setup_string, number=10000000)
    Iterative:
    3.733752099999947
    >>> setup_string = """
    ... from functools import reduce
    ... print("reduce():")
    ... def factorial(n):
    ...     return reduce(lambda x, y: x * y, range(1, n + 1) or [1])
    ... """
    
    >>> from timeit import timeit
    >>> timeit("factorial(4)", setup=setup_string, number=10000000)
    reduce():
    8.101526299999932
    >>> from math import factorial
    >>> factorial(4)
    24
    >>> setup_string = "from math import factorial"
    
    >>> from timeit import timeit
    >>> timeit("factorial(4)", setup=setup_string, number=10000000)
    0.3724050999999946
    names = [
        "Adam",
        [
            "Bob",
            [
                "Chet",
                "Cat",
            ],
            "Barb",
            "Bert"
        ],
        "Alex",
        [
            "Bea",
            "Bill"
        ],
        "Ann"
    ]
    >>> len(names)
    5
    >>> for index, item in enumerate(names):
    ...     print(index, item)
    ...
    0 Adam
    1 ['Bob', ['Chet', 'Cat'], 'Barb', 'Bert']
    2 Alex
    3 ['Bea', 'Bill']
    4 Ann
    >>> names
    ['Adam', ['Bob', ['Chet', 'Cat'], 'Barb', 'Bert'], 'Alex', ['Bea', 'Bill'], 'Ann']
    
    >>> names[0]
    'Adam'
    >>> isinstance(names[0], list)
    False
    
    >>> names[1]
    ['Bob', ['Chet', 'Cat'], 'Barb', 'Bert']
    >>> isinstance(names[1], list)
    True
    
    >>> names[1][1]
    ['Chet', 'Cat']
    >>> isinstance(names[1][1], list)
    True
    
    >>> names[1][1][0]
    'Chet'
    >>> isinstance(names[1][1][0], list)
    False
    def count_leaf_items(item_list):
        """Recursively counts and returns the
           number of leaf items in a (potentially
           nested) list.
        """
        count = 0
        for item in item_list:
            if isinstance(item, list):
                count += count_leaf_items(item)
            else:
                count += 1
    
        return count
    >>> count_leaf_items([1, 2, 3, 4])
    4
    >>> count_leaf_items([1, [2.1, 2.2], 3])
    4
    >>> count_leaf_items([])
    0
    
    >>> count_leaf_items(names)
    10
    >>> # Success!
     1def count_leaf_items(item_list):
     2    """Recursively counts and returns the
     3       number of leaf items in a (potentially
     4       nested) list.
     5    """
     6    print(f"List: {item_list}")
     7    count = 0
     8    for item in item_list:
     9        if isinstance(item, list):
    10            print("Encountered sublist")
    11            count += count_leaf_items(item)
    12        else:
    13            print(f"Counted leaf item \"{item}\"")
    14            count += 1
    15
    16    print(f"-> Returning count {count}")
    17    return count
    def fibonacciSimpleSum2(n):
        # if 0 is less than n and n is less than 5 then we know we can return
        # true because n will be 1-4 which can be created with 2 fib numbers
        if 0 < n < 5:
            return True
    
        # first get fibonacci sequence up to n
        seq = [0, 1]
        # starting from 2 and ending at n
        for i in range(2, n):
            # add seq at i - 2 (0 to start) and seq at i - 1 (1 to start)
            fib = seq[i - 2] + seq[i - 1]
            # if n is greater than fib
            if n >= fib:
                # we can append fib to the sequence
                seq.append(fib)
                # if fib is greater than or equal to n we can stop
            else:
                break
        print(seq)
    
        # The check I googled
        # for i, number in enumerate(seq[:-1]):
        #     paired = n - number
        #     if paired in seq[i + 1:]:
        #         return True
    
        # check if any 2 of the numbers in seq add up to n
        # My check
        for i in range(len(seq) - 1):  # O(n^2)
            j = 0
            while (seq[i] + seq[j]) != n:
                if j == len(seq) - 1:
                    break
                else:
                    j += 1
            if seq[i] + seq[j] == n:
                return True
    
        return False
    
    
    print(fibonacciSimpleSum2(5))
    def csSearchRotatedSortedArray(nums, target):
        min = 0
        max = len(nums) - 1
    
        while not max < min:
            guess = (max + min) // 2
            # print(f'min: {nums[min]} max: {nums[max]} guess:{nums[guess]} target:'
            #       f' {target}')
            # if the guess is the target we got it and return the guess
            if nums[guess] == target:
                # print('guessed the target')
                return guess
            # if min is less than or equal to the guess
            elif nums[min] <= nums[guess]:
                # print('min less than guess')
                # if min is less than or equal to the target and less than the guess
                if nums[min] <= target < nums[guess]:
                    # print('min less than or equal to target and less than guess')
                    # we can set max to the guess because nothing past the guess
                    # can be the target
                    max = guess
                    # else we can set min to guess + 1 because nothing before it
                    # can be the target
                else:
                    # print('min is greater than target and greater than or equal '
                    #       'to guess')
                    min = guess + 1
            # else if min is greater than the guess
            else:
                print('min is greater than or equal to guess')
                # if max - 1 is greater than the target and greater than the guess
                if nums[max - 1] >= target > nums[guess]:
                    # print('max - 1 greater than or equal to target and greater '
                    #       'than guess')
                    # we can set min to guess plus one because nothing before it
                    # can be the target
                    min = guess + 1
                else:
                    # print('max -1 less than target and less than or equal to guess')
                    # else we set max equal to guess because nothing after it can
                    # be the target
                    max = guess
    
        return -1C:\Lambda\CIRRICULUMN_NOTES\CS-python-notes_WEEKS\wk18\d4\homework\2021-09-09-14-03-46.png
    Your Guide to the CPython Source Codearrow-up-right
    CPython Internals bookarrow-up-right
    Notion—The AI workspace that works for you.Notionchevron-right

    Practice

    hashtag
    K-most-Frequent:

    def top_k_frequent(words, k):
      """
      Input:
      words -> List[str]
      k -> int
      Output:
      List[str]
      """
      frequency = {}
    
      for word in words:
        if word in frequency:
          frequency[word] += 1
        else:
          frequency[word] = 1
    
      sorted_data = sorted(frequency, key=lambda word: (-frequency[word], word))
      
      return sorted_data[:k]
    
    def helper(word):
    
    
      return word
    
    # Tests
    print(top_k_frequent(["the", "sky", "is", "cloudy", "the", "the", "the", "cloudy", "is", "is"], 4))
    print(top_k_frequent(["lambda", "school", "rules", "lambda", "school", "rocks"], 2))
    
    
    # # Output
    # 
    # ['the', 'is', 'cloudy', 'sky']
    # ['lambda', 'school']
    
    def djb2(key):
      """
      DJB2 hash, 32-bit
      """
      # Cast the key to a string and get bytes
      str_key = str(key).encode()
      # Start from an arbitrary large prime
      hash_value = 5381
      # Bit-shift and sum value for each character
      for b in str_key:
          hash_value = ((hash_value << 5) + hash_value) + b
          hash_value &= 0xffffffff  # DJB2 is a 32-bit hash, only keep 32 bits
      return hash
    
    
    def fnv1(key):
      """
      FNV-1 hash, 64-bit
      """
      # Cast the key to a string and get bytes
      str_key = str(key).encode()
      hash = 0x00000100000001B3  # FNV Prime
      for b in str_key:
          hash *= 0xcbf29ce484222325  # FNV Offset Basis
          hash ^= b
          hash &= 0xffffffffffffffff  # 64-bit hash
      return hash
    
    
    """# Load Factor
    - if load factor is greater than 70% then double capacity of storage
    - if load factor is less than 20% half capacity  storage
    """
    """
    1. Write the delete method with the assumption that linked list chaining was used for collision resolution.
    2. Write the get method with the assumption that linked list chaining was used for collision resolution.
    """
    
    
    class HashTableEntry:
      """
      Linked List hash table key/value pair
      """
    
      def __init__(self, key, value):
          self.key = key
          self.value = value
          self.next = None
    # Hash table can't have fewer than this many slots
    # MIN_CAPACITY = 8
    
    
    class HashTable:
      """
      A hash table that with `capacity` buckets
      that accepts string keys
      Implement this.
      """
    
      def __init__(self, capacity):
        self.capacity = capacity  # Number of buckets in the hash table
        self.storage = [None] * capacity
        self.item_count = 0
        self.MIN_CAPACITY = 8
    
      def get_num_slots(self):
        """
        Return the length of the list you're using to hold the hash
        table data. (Not the number of items stored in the hash table,
        but the number of slots in the main list.)
        """
        return len(self.storage)
    
      def get_load_factor(self):
        """
        Return the load factor for this hash table.
        Implement this.
        """
        return self.item_count / self.capacity
    
      def djb2(self, key):
        """
        DJB2 hash, 32-bit
        Implement this, and/or FNV-1.
        """
        # Cast the key to a string and get bytes
        str_key = str(key).encode()
        # Start from an arbitrary large prime
        hash_value = 5381
        # Bit-shift and sum value for each character
        for b in str_key:
            hash_value = ((hash_value << 5) + hash_value) + b
            hash_value &= 0xffffffff  # DJB2 is a 32-bit hash, only keep 32 bits
        return hash_value
    
      def hash_index(self, key):
        """
        Take an arbitrary key and return a valid integer index
        between within the storage capacity of the hash table.
        """
        return self.djb2(key) % self.capacity
    
      def put(self, key, value):
        """
        Store the value with the given key.
        Hash collisions should be handled with Linked List Chaining.
        Implement this.
        """
        index = self.hash_index(key)
        current_entry = self.storage[index]
        while current_entry is not None and current_entry.key != key:
            current_entry = current_entry.next
        if current_entry is not None:
            current_entry.value = value
        else:
            new_entry = HashTableEntry(key, value)
            new_entry.next = self.storage[index]
            self.storage[index] = new_entry
        # increment the item count
        self.item_count += 1
        # resize based on load factor reaching higher than 70% (using a doubling strategy)
        if self.get_load_factor() > 0.7:
          self.resize(self.capacity * 2)
    
      def delete(self, key):
        """
        Remove the value stored with the given key.
        Print a warning if the key is not found.
        Implement this.
        """
        index = self.hash_index(key)
        current_entry = self.storage[index]
        last_entry = None
        while current_entry is not None and current_entry.key != key:
            last_entry = current_entry
            current_entry = last_entry.next
        if current_entry is None:
            print("ERROR: Unable to remove the entry with a key of", key)
        else:
            if last_entry is None:
                self.storage[index] = current_entry.next
            else:
                last_entry.next = current_entry.next
            # decrement the item count
            self.item_count -= 1
        # TODO:  resizing?
        if self.get_load_factor() < 0.2:
          if self.capacity > self.MIN_CAPACITY:
            new_capacity = self.capacity // 2
            if new_capacity < self.MIN_CAPACITY:
              new_capacity = self.MIN_CAPACITY
            self.resize(new_capacity)
    
      def get(self, key):
        """
        Retrieve the value stored with the given key.
        Returns None if the key is not found.
        Implement this.
        """
        index = self.hash_index(key)
        current_entry = self.storage[index]
        # while the current entry exists
        while current_entry is not None:
            # check if the current entry key is the same as the passed in key
            if current_entry.key == key:
                # return the current entry value
                return current_entry.value
            # traverse to the next entry
            current_entry = current_entry.next
        return None
    
      def resize(self, new_capacity):  # O(n * k)
        """
        Changes the capacity of the hash table and rehashes all of the key / value pairs
        """
        # hold a ref to the old storage
        old_storage = self.storage
        # set the new capacity
        self.capacity = new_capacity
        # create the new storage
        self.storage = [None] * self.capacity
        # create a placeholder for the current entrys reference
        current = None
        # keep a copy of the original item count
        old_count = self.item_count
        # iterate over each bucket in the old storage
        for bucket_item in old_storage:
          # get the current entry
          current = bucket_item
          # while the current entry exists
          while current:
            # put the current entrys key value pair in to the new storage
            self.put(current.key, current.value)
            # traverse to the next entry
            current = current.next
        # restore the item count
        self.item_count = old_count
    
    
    if __name__ == "__main__":
        ht = HashTable(8)
        ht.put("line_1", "'Twas brillig, and the slithy toves")
        ht.put("line_2", "Did gyre and gimble in the wabe:")
        ht.put("line_3", "All mimsy were the borogoves,")
        ht.put("line_4", "And the mome raths outgrabe.")
        ht.put("line_5", '"Beware the Jabberwock, my son!')
        ht.put("line_6", "The jaws that bite, the claws that catch!")
        ht.put("line_7", "Beware the Jubjub bird, and shun")
        ht.put("line_8", 'The frumious Bandersnatch!"')
        ht.put("line_9", "He took his vorpal sword in hand;")
        ht.put("line_10", "Long time the manxome foe he sought--")
        ht.put("line_11", "So rested he by the Tumtum tree")
        ht.put("line_12", "And stood awhile in thought.")
        print("")
        # Test storing beyond capacity
        for i in range(1, 13):
            print(ht.get(f"line_{i}"))
        # Test resizing
        old_capacity = ht.get_num_slots()
        ht.resize(ht.capacity * 2)
        new_capacity = ht.get_num_slots()
        print(f"\nResized from {old_capacity} to {new_capacity}.\n")
        # Test if data intact after resizing
        for i in range(1, 13):
            print(ht.get(f"line_{i}"))
        print("")
    """# Demo"""
    """
    You are given a non-empty list of words.
    Write a function that returns the *k* most frequent elements.
    The list that you return should be sorted by frequency from highest to lowest.
    If two words have the same frequency, then the word with the lower alphabetical
    order should come first.
    Example 1:
    ```plaintext
    Input:
    words = ["lambda", "school", "rules", "lambda", "school", "rocks"]
    k = 2
    Output:
    ["lambda", "school"]
    Explanation:
    "lambda" and "school" are the two most frequent words.
    ```
    Example 2:
    ```plaintext
    Input:
    words = ["the", "sky", "is", "cloudy", "the", "the", "the", "cloudy", "is", "is"]
    k = 4
    Output:
    ["the", "is", "cloudy", "sky"]
    Explanation:
    "the", "is", "cloudy", and "sky" are the four most frequent words. The words
    are sorted from highest frequency to lowest.
    ```
    Notes:
    - `k` is always valid: `1 <= k <= number of unique elements.
    - words in the input list only contain lowercase letters.
    ```
    ["the", "sky", "is", "cloudy", "the", "the", "the", "cloudy", "is", "is"]
    freq = {"the": 4, "sky": 1, "is": 3, "cloudy": 2} sort this based on the values
    res = ["the", "is", "cloudy", "sky"]
    res[:k]
    # get the freq of each word
    # sort the dictionary besed on keys (maybe use a lambda)
    # encapsulate this sorted dictionary inside another function to sort it alpha?
    # then slice the results to constrain to k
    """
    
    
    def top_k_frequent(words, k):
      """
      Input:
      words -> List[str]
      k -> int
      Output:
      List[str]
      """
      frequency = {}
      for word in words:
        if word in frequency:
          frequency[word] += 1
        else:
          frequency[word] = 1
      print(frequency)
    
    
    # Tests
    top_k_frequent(["the", "sky", "is", "cloudy", "the",
                    "the", "the", "cloudy", "is", "is"], 4)
    # top_k_frequent(["lambda", "school", "rules", "lambda", "school", "rocks"], 2)
    
    Recursive definition of ancestors
    Definition of factorial
    Nested list example
    Factorial illustration
    2^n = 64
    Logo
    log_2 128 = n
    Recursive definition of factorial
    Google Colabcolab.research.google.comchevron-right
    Logo