1 of 3

Hash Table

"""
Hashtables (associative arrays)
"""


# linear probing
class HashTable(object):

    def __init__(self):
        self.size = 10
        self.keys = [None] * self.size
        self.values = [None] * self.size

    def put(self, key, data):

        index = self.hashfunction(key)

        # not None -> it is a collision !!!
        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.values[index] = data  # update
                return

            # rehash try to find another slot
            index = (index + 1) % self.size

        # insert
        self.keys[index] = key
        self.values[index] = data

    def get(self, key):

        index = self.hashfunction(key)

        while self.keys[index] is not None:
            if self.keys[index] == key:
                return self.values[index]

            index = (index + 1) % self.size

        # it means the key is not present in the associative array
        return None

    def hashfunction(self, key):
        sum = 0
        for pos in range(len(key)):
            sum = sum + ord(key[pos])

        return sum % self.size

table = HashTable()

table.put("apple", 10)
table.put("orange", 20)
table.put("car", 30)
table.put("table", 40)

print(table.get("car"))

Hash Table

Describe an implementation of a least-used cache, and big-O notation of it.
A question involving an API's integration with hash map where the buckets of hash map are made up of linked lists.
Implement data structure Map storing pairs of integers (key, value) and define following member functions in O(1) runtime: void insert(key, value), void delete(key), int get(key), int getRandomKey().

"""Write a HashTable class that stores strings
in a hash table, where keys are calculated
using the first two letters of the string."""

class HashTable(object):
    def __init__(self):
        self.table = [None]*10000

    def store(self, string):
        """Input a string that's stored in
        the table."""
        index = self.calculate_hash_value(string)
        if(self.lookup(string)==-1):
            self.table[index] = [string]
        else:
            self.table[index].append(string)
        pass

    def lookup(self, string):
        """Return the hash value if the
        string is already in the table.
        Return -1 otherwise."""
        index = self.calculate_hash_value(string)
        if(self.table[index]!=None):
            return index
        else:
            return -1

    def calculate_hash_value(self, string):
        """Helper function to calulate a
        hash value from a string."""
        hash_val = ord(string[0])*100+ord(string[1])
        return hash_val

# Setup
hash_table = HashTable()

# Test calculate_hash_value
# Should be 8568
print hash_table.calculate_hash_value('UDACITY')

# Test lookup edge case
# Should be -1
print hash_table.lookup('UDACITY')

# Test store
hash_table.store('UDACITY')
# Should be 8568
print hash_table.lookup('UDACITY')

# Test store edge case
hash_table.store('UDACIOUS')
# Should be 8568
print hash_table.lookup('UDACIOUS')

#print(hash_table.table)

Array Binary Search Tree Linked List Extra-Array Stack Binary Tree Recursion Hash Table Searching Sorting Queue Sandbox Hash Table Double Linked List Graphs Exotic Heap

Hashmap or Hash tables

1. Hashmap or Hash tables

Many applications require a dynamic set that supports only the dictionary operations INSERT, SEARCH, DELETE. Under reasonable assumptions, the average time to search for an element in a hash table is O(1). The worst case is O(n) when there are as many as n collision when we are doing searching.

A hash table typically uses an array of size proportional to the number of keys actually stored. Instead of using the key as an array index directly, the array index is computed from the key using a hash function. When more than one key maps to the same array index, this is called “collision”, which is usually handled by “Chaining” that we store a linked list in the collide index position.

Hash function

Hash function maps the universe U of keys into the slots of a hashtable T[0..m-1]. With hash function the element is stored in h(k).

h: U -> {0,1,..., m-1}, where the size m of the hash table is much less than |U| . With hash function, the range of array indices and the size of the array is reduced.

Collision

When two keys hash to the same slot, this situation is called collision. The idea to solve collision is to avoid collisions.

One idea is to make h appear to be “random”, thus avoiding collisions or at least minimizing their number. Because |U|>m , there must be at least two keys that have the same hash value; avoiding collisions altogether is therefore impossible. Thus, we use a well-designed, “random”-looking hash function to minimize the number of collisions. However, we still need a method for resolving the collisions that do occur.

Resolving Collision by Chaining

In chaining, we place all the elements that hash to the same slot into the same linked list.Fig 1: Hash table and chaining. Figure from here.

Common Used Hash Functions

A good hash function satisfied the condtion of simple uniform hashing: each key is equally likely to has to any of the m slots.

Step 1: interpreting keys as nature numbers N = {0,1,2, …}. For string or character, we might translate pt as the pair of decimal integers (112,116) with their ASCII character; then we express it as a radix-128 integer, then the number we get is (112*128)+116 = 14452.

Step 2: Define hashing functions

The division method, h(k) = k mod m
The multiplication method, h(k)=⌊m(kA mod 1)⌋. kA mod 1 means the fractional part of kA , which can be noted as {kA} , and equals to kA-⌊ kA ⌋, 0<A<1 . e.g. for 45.2 the fractional part of it is .2.

Implementation with Python List

Python built-in hashmap data structure

The set classes are implemented using dictionaries. Accordingly, the requirements for set elements are the same as those for dictionary keys; namely, that the element defines both __eq__() and __hash__(). As a result, sets cannot contain mutable elements such as lists or dictionaries. However, they can contain immutable collections such as tuples or instances of ImmutableSet. For convenience in implementing sets of sets, inner sets are automatically converted to immutable form, for example, Set([Set(['dog'])]) is transformed to Set([ImmutableSet(['dog'])]).

OrderedDict

Standard dictionaries are unordered, which means that any time you loop through a dictionary, you will go through every key, but you are not guaranteed to get them in any particular order.

The OrderedDict from the collections module is a special type of dictionary that keeps track of the order in which its keys were inserted. Iterating the keys of an orderedDict has predictable behavior. This can simplify testing and debugging by making all the code deterministic.

defaultDict

Dictionaries are useful for bookkeeping and tracking statistics. One problem is that when we try to add an element, we have no idea if the key is present or not, which requires us to check such condition every time.

dict = {}
key = "counter"
if key not in dict:
    dict[key]=0
dict[key] += 1

The defaultdict class from the collections module simplifies this process by pre-assigning a default value when a key does not present. For different value type it has different default value, for example, for int, it is 0 as the default value.

from collections import defaultdict
dict = defaultdict(int)
dict['counter']+=1

Time Complexity for Operations

Search, Insert, Delete: O(1). Check here for more details.

2. Queue, Stack, and Heap

Stacks and queue are dynamic lists in which the element removed from the list by the DELETE operation is prespecified. The operation of PUSH and POP takes O(1) time complexity. These two structure are good for problems that we deal with element in either FIFO or LIFO manner.

queue: it is first in, first out, FIFO, which can be used to implement iterative BFS. The following code is the basic implementation with Python.

q = []
q.insert[0,v]
q.pop()

stack: the element deleted is the most recently inserted, Last in first out, LIFO, which can be used to implement iterative DFS. The following code is the basic implementation with Python.

stack = []
stack.append(v)
stack.pop()

Deque (operation from both side): In Python, the deque class from the collections module is a double-ended queue. It provides constant time opearations for inserting ore removing items from its beginning or end. It can be used to implement both stack and queue structure we mentioned above and usually is more time efficient than the above implementation. Mainly because inserting or removing elements from the head of a list takes linear time.

from collections import deque
#1. implement Queue
fifo=deque()
fifo.append(l)
fifo.popleft()
#2. implement Stack
lifo = deque()
lifo.append(l)
lifo.pop()

Heap

The (binary) heap data structure is an array that we can view as a nearly complete binary tree. The tree is completely filled on all levels except possibly the lowest, which is filled from left up to a point.(a) binary tree (b) an array

As we can see we can implement either the max-heap or the min-heap as an array. Because the tree is complete, the left child of a parent (at position p) is the node that is found in position 2p in the list. Similarly, the right child of the parent is at position 2p+1 in the list. To find the parent of any node in the tree, we can simply use Python’s integer division. Given that a node is at position n in the list, the parent is at position n/2.

There are two kinds of binary heaps: max-heaps and min-heaps. In both kinds, the values in the nodes satisfy a heap property. For max-heap, the property states as for every node i other than root.

A[PARENT(i)]>= A[i];

Thus, the largest element in a max-heap is stored at the root.

For a heap of n elements the height is theta(logn) because it is a complete binary tree.

Heap can be used into heapsort and a priority-queue data structure. Operations include:

MAX-HEAPIFY, runs in O(lgn), is the key to maintaining the max-heap property
BUILD-MAX-HEAP, runs in linear time, produces a maxheap from an unordered input arrary
MAX-HEAP-INSERT, HEAP-EXTRACT-MAX, HEAP-INCREASE-KEY, and HEAP-MAXIMUM, runs in O(lgn) time, allow the heap data structure to implement a priority queue.

heapq: heapq from collections is an implementation of heap, which can be used to maintain a priority queue. Operations include heappush, heappop, and nsmallest. heapq in python to maintain a priority queue with O(logn)

heap = []
heappush(heap, 5)
heappush(heap, 3)
heappush(heap, 7)
heappush(heap, 4)

Items are removed by the highest priority or say the lowest number first. Also, accessing the 0 index of the heap will return the smallest item.

print(heappop(a), heappop(a), heappop(a), heappop(a))
3 4 5 7
assert heap[0] == nsmallest(1, heap)[0] == 3

More materials can be found here.

Monotonous Stack: For monotonous increasing stack, which only allow the increasing element to be put in the stack, smaller one came will kick out the larger one in the previous position, untill we found one that is smaller than the current element. For the monotonous decreasing stack, the larger elements will force the stack to kick out the previous smaller element untill larger one found. In monotonous stack, we only operate at the end of the stack. To summarize, monotonous stack has two features:

monotonic
when adding new element, we will delete all the previous elements that break the first monotonic feature.
For increasing stack: we can find the first element to the left that is larger than current element
For decreasing stack: we can find the first element to the left that is smaller than current element.

3. Linked List

Like arrays, Linked List is a linear data structure. Unlike arrays, linked list elements are not stored at contiguous location; the elements are linked using pointers.

Why Linked List? Arrays can be used to store linear data of similar types, but arrays have following limitations. 1) The size of the arrays is fixed: So we must know the upper limit on the number of elements in advance. Also, generally, the allocated memory is equal to the upper limit irrespective of the usage. 2) Inserting a new element in an array of elements is expensive, because room has to be created for the new elements and to create room existing elements have to shifted.

Advantages over arrays 1) Dynamic size 2) Ease of insertion/deletion

Drawbacks: 1) Random access is not allowed. We have to access elements sequentially starting from the first node. So we cannot do binary search with linked lists. 2) Extra memory space for a pointer is required with each element of the list.

For Linked List, we can only iterate over elements, for python code example:

#Definition for singly-linked list.
class ListNode(object):
     def __init__(self, x):
         self.val = x
         self.next = None#First, construct a head and point and record the head
pointer=head = ListNode(None)
#loop through all node
while pointer:
    print pointer.val
    pointer = pointer.next

Dummy Node in Linked List

Remove Duplicates from Sorted List II
Reverse Linked List II
Partition List

Basic Linked List Skills

Sort List
Reorder List

Two Pointers in Linked List (Fast-slow pointers)

Merge K Sorted Lists

4. Tree

Binary Tree and Binary Search Tree, please see my following post.Solving Tree Problems on LeetCodePart of this great node comes from blog:medium.com

Reference

[1] Cormen, Thomas H. Introduction to algorithms. MIT press, 2009.

[2] Géron, Aurélien. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. “ O’Reilly Media, Inc.”, 2017.

[3]https://www.geeksforgeeks.org/heap-queue-or-heapq-in-python/

Hash Table and HashMap in Python

Data requires a number of ways in which it can be stored and accessed. One of the most important implementations includes Hash Tables. In Python, these Hash tables are implemented through the built-in data type i.e, dictionary. In this article, you will learn what are Hash Tables and Hashmaps in Python and how you can implement them using dictionaries.

Before moving ahead, let us take a look at all the topics of discussion:

What is a Hash table or a Hashmap in Python?
Hash table vs Hashmap
Creating Dictionaries
Creating Nested Dictionaries
Performing Operations on Hash Tables using dictionaries

Accessing Values
Updating Values
Deleting Items

Converting a Dictionary into a Dataframe

In computer science, a Hash table or a Hashmap is a type of data structure that maps keys to its value pairs (implement abstract array data types). It basically makes use of a function that computes an index value that in turn holds the elements to be searched, inserted, removed, etc. This makes it easy and fast to access data. In general, hash tables store key-value pairs and the key is generated using a hash function.

Hash tables or has maps in Python are implemented through the built-in dictionary data type. The keys of a dictionary in Python are generated by a hashing function. The elements of a dictionary are not ordered and they can be changed.

An example of a dictionary can be a mapping of employee names and their employee IDs or the names of students along with their student IDs.

Moving ahead, let’s see the difference between the hash table and hashmap in Python.

Dictionaries can be created in two ways:

Using curly braces ({})
Using the dict() function

Using curly braces:

Dictionaries in Python can be created using curly braces as follows:

EXAMPLE:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
print(my_dict)
type(my_dict)

OUTPUT:

{‘Dave’: ‘001’, ‘Ava’: ‘002’, ‘Joe’: ‘003’} dict

Using dict() function:

Python has a built-in function, dict() that can be used to create dictionaries in Python. For example:

EXAMPLE:

new_dict=dict()
print(new_dict)
type(new_dict)

OUTPUT:

{} dict

In the above example, an empty dictionary is created since no key-value pairs are supplied as a parameter to the dict() function. In case you want to add values, you can do as follows:

EXAMPLE:

new_dict=dict(Dave = '001' , Ava= '002' , Joe= '003')
print(new_dict)
type(new_dict)

OUTPUT:

{‘Dave’: ‘001’, ‘Ava’: ‘002’, ‘Joe’: ‘003’} dict

Nested dictionaries are basically dictionaries that lie within other dictionaries. For example:

EXAMPLE:

emp_details = {'Employee': {'Dave': {'ID': '001',
 'Salary': 2000,
 'Designation':'Python Developer'},
 'Ava': {'ID':'002',
 'Salary': 2300,
 'Designation': 'Java Developer'},
 'Joe': {'ID': '003',
 'Salary': 1843,
 'Designation': 'Hadoop Developer'}}}

There are a number of operations that can be performed on has tables in Python through dictionaries such as:

Accessing Values
Updating Values
Deleting Element

Accessing Values:

The values of a dictionary can be accessed in many ways such as:

Using key values
Using functions
Implementing the for loop

Using key values:

Dictionary values can be accessed using the key values as follows:

EXAMPLE:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
my_dict\['Dave'\]

OUTPUT: ‘ 001′

Using functions:

There are a number of built-in functions that can be used such as get(), keys(), values(), etc.

EXAMPLE:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
print(my_dict.keys())
print(my_dict.values())
print(my_dict.get('Dave'))

OUTPUT:

dict_keys(‘Dave’, ‘Ava’, ‘Joe’‘Dave’,‘Ava’,‘Joe’) dict_values(‘001’, ‘002’, ‘003’‘001’,‘002’,‘003’) 001

Implementing the for loop:

The for loop allows you to access the key-value pairs of a dictionary easily by iterating over them. For example:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
print("All keys")
for x in my_dict:
 print(x) #prints the keys
print("All values")
for x in my_dict.values():
 print(x) #prints values
print("All keys and values")
for x,y in my_dict.items():
 print(x, ":" , y) #prints keys and values

OUTPUT:

All keys Dave Ava Joe All values 001 002 003 All keys and values Dave : 001 Ava : 002 Joe : 003

Dictionaries are mutable data types and therefore, you can update them as and when required. For example, if I want to change the ID of the employee named Dave from ‘001’ to ‘004’ and if I want to add another key-value pair to my dictionary, I can do as follows:

EXAMPLE:

my_dict={'Dave' : '001' , 'Ava': '002' , 'Joe': '003'}
my_dict\['Dave'\] = '004' #Updating the value of Dave
my_dict\['Chris'\] = '005' #adding a key-value pair
print(my_dict)

OUTPUT: {‘Dave’: ‘004’, ‘Ava’: ‘002’, ‘Joe’: ‘003’, ‘Chris’: ‘005’}

There a number of functions that allow you to delete items from a dictionary such as del(), pop(), popitem(), clear(), etc. For example:

EXAMPLE:

my_dict={'Dave': '004', 'Ava': '002', 'Joe': '003', 'Chris': '005'}
del my_dict\['Dave'\] #removes key-value pair of 'Dave'
my_dict.pop('Ava') #removes the value of 'Ava'
my_dict.popitem() #removes the last inserted item
print(my_dict)

OUTPUT: {‘Joe’: ‘003’}

The above output shows that all the elements except ‘Joe: 003’ have been removed from the dictionary using the various functions.

As you have seen previously, I have created a nested dictionary containing employee names and their details mapped to it. Now to make a clear table out of that, I will make use of the pandas library in order to put everything as a dataframe.

EXAMPLE:

import pandas as pd
emp_details = {'Employee': {'Dave': {'ID': '001',
 'Salary': 2000,
 'Designation':'Python Developer'},
 'Ava': {'ID':'002',
 'Salary': 2300,
 'Designation': 'Java Developer'},
 'Joe': {'ID': '003',
 'Salary': 1843,
 'Designation': 'Hadoop Developer'}}}
df=pd.DataFrame(emp_details\['Employee'\])
print(df)

OUTPUT: