Many applications require a dynamic set that supports only the dictionary operations INSERT, SEARCH, DELETE. Under reasonable assumptions, the average time to search for an element in a hash table is O(1). The worst case is O(n) when there are as many as n collision when we are doing searching.
A hash table typically uses an array of size proportional to the number of keys actually stored. Instead of using the key as an array index directly, the array index is computed from the key using a hash function. When more than one key maps to the same array index, this is called “collision”, which is usually handled by “Chaining” that we store a linked list in the collide index position.
Hash function
Hash function maps the universe U of keys into the slots of a hashtable T[0..m-1]. With hash function the element is stored in h(k).
h: U -> {0,1,..., m-1},
where the size m
of the hash table is much less than |U|
. With hash function, the range of array indices and the size of the array is reduced.
Collision
When two keys hash to the same slot, this situation is called collision. The idea to solve collision is to avoid collisions.
One idea is to make h
appear to be “random”, thus avoiding collisions or at least minimizing their number. Because |U|>m
, there must be at least two keys that have the same hash value; avoiding collisions altogether is therefore impossible. Thus, we use a well-designed, “random”-looking hash function to minimize the number of collisions. However, we still need a method for resolving the collisions that do occur.
Resolving Collision by Chaining
In chaining, we place all the elements that hash to the same slot into the same linked list.Fig 1: Hash table and chaining. Figure from here.
Common Used Hash Functions
A good hash function satisfied the condtion of simple uniform hashing: each key is equally likely to has to any of the m
slots.
Step 1: interpreting keys as nature numbers N = {0,1,2, …}. For string or character, we might translate pt as the pair of decimal integers (112,116)
with their ASCII character; then we express it as a radix-128 integer, then the number we get is (112*128)+116 = 14452.
Step 2: Define hashing functions
The division method, h(k) = k mod m
The multiplication method, h(k)=
⌊m(kA mod 1)⌋. kA mod 1
means the fractional part of kA
, which can be noted as {kA}
, and equals to kA-
⌊ kA ⌋, 0<A<1
. e.g. for 45.2 the fractional part of it is .2.
Implementation with Python List
Python built-in hashmap data structure
The set classes are implemented using dictionaries. Accordingly, the requirements for set elements are the same as those for dictionary keys; namely, that the element defines both __eq__()
and __hash__()
. As a result, sets cannot contain mutable elements such as lists or dictionaries. However, they can contain immutable collections such as tuples or instances of ImmutableSet
. For convenience in implementing sets of sets, inner sets are automatically converted to immutable form, for example, Set([Set(['dog'])])
is transformed to Set([ImmutableSet(['dog'])])
.
OrderedDict
Standard dictionaries are unordered, which means that any time you loop through a dictionary, you will go through every key, but you are not guaranteed to get them in any particular order.
The OrderedDict from the collections module is a special type of dictionary that keeps track of the order in which its keys were inserted. Iterating the keys of an orderedDict has predictable behavior. This can simplify testing and debugging by making all the code deterministic.
defaultDict
Dictionaries are useful for bookkeeping and tracking statistics. One problem is that when we try to add an element, we have no idea if the key is present or not, which requires us to check such condition every time.
The defaultdict class from the collections module simplifies this process by pre-assigning a default value when a key does not present. For different value type it has different default value, for example, for int, it is 0 as the default value.
Time Complexity for Operations
Search, Insert, Delete: O(1). Check here for more details.
Stacks and queue are dynamic lists in which the element removed from the list by the DELETE operation is prespecified. The operation of PUSH and POP takes O(1) time complexity. These two structure are good for problems that we deal with element in either FIFO or LIFO manner.
queue: it is first in, first out, FIFO, which can be used to implement iterative BFS. The following code is the basic implementation with Python.
stack: the element deleted is the most recently inserted, Last in first out, LIFO, which can be used to implement iterative DFS. The following code is the basic implementation with Python.
Deque (operation from both side): In Python, the deque class from the collections module is a double-ended queue. It provides constant time opearations for inserting ore removing items from its beginning or end. It can be used to implement both stack and queue structure we mentioned above and usually is more time efficient than the above implementation. Mainly because inserting or removing elements from the head of a list takes linear time.
Heap
As we can see we can implement either the max-heap or the min-heap as an array. Because the tree is complete, the left child of a parent (at position p) is the node that is found in position 2p in the list. Similarly, the right child of the parent is at position 2p+1 in the list. To find the parent of any node in the tree, we can simply use Python’s integer division. Given that a node is at position n in the list, the parent is at position n/2.
There are two kinds of binary heaps: max-heaps and min-heaps. In both kinds, the values in the nodes satisfy a heap property. For max-heap, the property states as for every node i
other than root.
Thus, the largest element in a max-heap is stored at the root.
For a heap of n
elements the height is theta(logn) because it is a complete binary tree.
Heap can be used into heapsort and a priority-queue data structure. Operations include:
MAX-HEAPIFY, runs in O(lgn), is the key to maintaining the max-heap property
BUILD-MAX-HEAP, runs in linear time, produces a maxheap from an unordered input arrary
MAX-HEAP-INSERT, HEAP-EXTRACT-MAX, HEAP-INCREASE-KEY, and HEAP-MAXIMUM, runs in O(lgn) time, allow the heap data structure to implement a priority queue.
heapq: heapq from collections is an implementation of heap, which can be used to maintain a priority queue. Operations include heappush, heappop, and nsmallest. heapq in python to maintain a priority queue with O(logn)
Items are removed by the highest priority or say the lowest number first. Also, accessing the 0 index of the heap will return the smallest item.
More materials can be found here.
Monotonous Stack: For monotonous increasing stack, which only allow the increasing element to be put in the stack, smaller one came will kick out the larger one in the previous position, untill we found one that is smaller than the current element. For the monotonous decreasing stack, the larger elements will force the stack to kick out the previous smaller element untill larger one found. In monotonous stack, we only operate at the end of the stack. To summarize, monotonous stack has two features:
monotonic
when adding new element, we will delete all the previous elements that break the first monotonic feature.
For increasing stack: we can find the first element to the left that is larger than current element
For decreasing stack: we can find the first element to the left that is smaller than current element.
Why Linked List? Arrays can be used to store linear data of similar types, but arrays have following limitations. 1) The size of the arrays is fixed: So we must know the upper limit on the number of elements in advance. Also, generally, the allocated memory is equal to the upper limit irrespective of the usage. 2) Inserting a new element in an array of elements is expensive, because room has to be created for the new elements and to create room existing elements have to shifted.
Advantages over arrays 1) Dynamic size 2) Ease of insertion/deletion
Drawbacks: 1) Random access is not allowed. We have to access elements sequentially starting from the first node. So we cannot do binary search with linked lists. 2) Extra memory space for a pointer is required with each element of the list.
For Linked List, we can only iterate over elements, for python code example:
Dummy Node in Linked List
Remove Duplicates from Sorted List II
Reverse Linked List II
Partition List
Basic Linked List Skills
Sort List
Reorder List
Two Pointers in Linked List (Fast-slow pointers)
Merge K Sorted Lists
Binary Tree and Binary Search Tree, please see my following post.Solving Tree Problems on LeetCodePart of this great node comes from blog:medium.com
[1] Cormen, Thomas H. Introduction to algorithms. MIT press, 2009.
[2] Géron, Aurélien. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. “ O’Reilly Media, Inc.”, 2017.
[3]https://www.geeksforgeeks.org/heap-queue-or-heapq-in-python/
The (binary) heap data structure is an array that we can view as a nearly complete binary tree. The tree is completely filled on all levels except possibly the lowest, which is filled from left up to a point.(a) binary tree (b) an array
Like arrays, Linked List is a linear data structure. Unlike arrays, linked list elements are not stored at contiguous location; the elements are linked using pointers.
Describe an implementation of a least-used cache, and big-O notation of it.
A question involving an API's integration with hash map where the buckets of hash map are made up of linked lists.
Implement data structure Map
storing pairs of integers (key, value) and define following member functions in O(1) runtime: void insert(key, value)
, void delete(key)
, int get(key)
, int getRandomKey()
.
Data requires a number of ways in which it can be stored and accessed. One of the most important implementations includes Hash Tables. In Python, these Hash tables are implemented through the built-in data type i.e, dictionary. In this article, you will learn what are Hash Tables and Hashmaps in Python and how you can implement them using dictionaries.
Before moving ahead, let us take a look at all the topics of discussion:
What is a Hash table or a Hashmap in Python?
Hash table vs Hashmap
Creating Dictionaries
Creating Nested Dictionaries
Performing Operations on Hash Tables using dictionaries
Accessing Values
Updating Values
Deleting Items
Converting a Dictionary into a Dataframe
In computer science, a Hash table or a Hashmap is a type of data structure that maps keys to its value pairs (implement abstract array data types). It basically makes use of a function that computes an index value that in turn holds the elements to be searched, inserted, removed, etc. This makes it easy and fast to access data. In general, hash tables store key-value pairs and the key is generated using a hash function.
Hash tables or has maps in Python are implemented through the built-in dictionary data type. The keys of a dictionary in Python are generated by a hashing function. The elements of a dictionary are not ordered and they can be changed.
An example of a dictionary can be a mapping of employee names and their employee IDs or the names of students along with their student IDs.
Moving ahead, let’s see the difference between the hash table and hashmap in Python.
Dictionaries can be created in two ways:
Using curly braces ({})
Using the dict() function
Dictionaries in Python can be created using curly braces as follows:
EXAMPLE:
OUTPUT:
{‘Dave’: ‘001’, ‘Ava’: ‘002’, ‘Joe’: ‘003’} dict
Python has a built-in function, dict() that can be used to create dictionaries in Python. For example:
EXAMPLE:
OUTPUT:
{} dict
In the above example, an empty dictionary is created since no key-value pairs are supplied as a parameter to the dict() function. In case you want to add values, you can do as follows:
EXAMPLE:
OUTPUT:
{‘Dave’: ‘001’, ‘Ava’: ‘002’, ‘Joe’: ‘003’} dict
Nested dictionaries are basically dictionaries that lie within other dictionaries. For example:
EXAMPLE:
There are a number of operations that can be performed on has tables in Python through dictionaries such as:
Accessing Values
Updating Values
Deleting Element
The values of a dictionary can be accessed in many ways such as:
Using key values
Using functions
Implementing the for loop
Dictionary values can be accessed using the key values as follows:
EXAMPLE:
OUTPUT: ‘ 001′
There are a number of built-in functions that can be used such as get(), keys(), values(), etc.
EXAMPLE:
OUTPUT:
dict_keys(‘Dave’, ‘Ava’, ‘Joe’‘Dave’,‘Ava’,‘Joe’) dict_values(‘001’, ‘002’, ‘003’‘001’,‘002’,‘003’) 001
The for loop allows you to access the key-value pairs of a dictionary easily by iterating over them. For example:
OUTPUT:
All keys Dave Ava Joe All values 001 002 003 All keys and values Dave : 001 Ava : 002 Joe : 003
Dictionaries are mutable data types and therefore, you can update them as and when required. For example, if I want to change the ID of the employee named Dave from ‘001’ to ‘004’ and if I want to add another key-value pair to my dictionary, I can do as follows:
EXAMPLE:
OUTPUT: {‘Dave’: ‘004’, ‘Ava’: ‘002’, ‘Joe’: ‘003’, ‘Chris’: ‘005’}
There a number of functions that allow you to delete items from a dictionary such as del(), pop(), popitem(), clear(), etc. For example:
EXAMPLE:
OUTPUT: {‘Joe’: ‘003’}
The above output shows that all the elements except ‘Joe: 003’ have been removed from the dictionary using the various functions.
As you have seen previously, I have created a nested dictionary containing employee names and their details mapped to it. Now to make a clear table out of that, I will make use of the pandas library in order to put everything as a dataframe.
EXAMPLE:
OUTPUT: