D1-Graphs I
Last updated
Last updated
****
Graphs are collections of related data. They’re like trees, except connections can be made from any node to any other node, even forming loops. By this definition, all trees are graphs, but not all graphs are trees.
We call the nodes in a graph vertexes (or vertices or verts), and we call the connections between the verts edges.
An edge denotes a relationship or linkage between the two verts.
Graphs can represent any multi-way relational data.
A graph could show a collection of cities and their linking roads.
It could show a collection of computers on a network.
It could show a population of people who know each other and Kevin Bacon (Links to an external site.).
It could represent trade relationships between nations.
It could represent the money owed in an ongoing poker night amongst friends.
And so on.
Directed and Undirected Graphs
The nature of the relationship that we represent determines if we should use a directed or undirected graph. If we could describe the relationship as "one way", then a directed graph makes the most sense. For example, representing the owing of money to others (debt) with a directed graph would make sense.
Directed graphs can also be bidirectional. For example, road maps are directed since all roads are one-way; however, most streets consist of lanes in both directions.
If the relationship's nature is a mutual exchange, then an undirected graph makes the most sense. For example, we could use an undirected graph to represent users who have exchanged money in the past. Since an "exchange" relationship is always mutual, an undirected graph makes the most sense here.
Cyclic and Acyclic Graphs
If you can form a cycle (for example, follow the edges and arrive again at an already-visited vert), the graph is cyclic. For instance, in the image below, you can start at B and then follow the edges to C, E, D, and then back to B (which you’ve already visited).
Note: any undirected graph is automatically cyclic since you can always travel back across the same edge.
If you cannot form a cycle (for example, you cannot arrive at an already-visited vert by following the edges), we call the graph acyclic. In the example below, no matter which vert you start at, you cannot follow edges in such a way that you can arrive at an already-visited vert.
Weighted Graphs
Weighted graphs have values associated with the edges. We call the specific values assigned to each edge weights.
The weights represent different data in different graphs. In a graph representing road segments, the weights might represent the length of the road. The higher the total weight of a route on the graph, the longer the trip is. The weights can help decide which particular path we should choose when comparing all routes.
We can further modify weights. For example, if you were building a graph representing a map for bicycle routes, we could give roads with bad car traffic or very steep inclines unnaturally large weights. That way, a routing algorithm would be unlikely to take them. (This is how Google Maps avoids freeways when you ask it for walking directions.)
Note: Djikstra's Algorithm (Links to an external site.) is a graph search variant that accounts for edge weights.
Directed Acyclic Graphs (DAGs)
A directed acyclic graph (DAG) is a directed graph with no cycles. In other words, we can order a DAG’s vertices linearly in such a way that every edge is directed from earlier to later in the sequence.
A DAG has several applications. DAGs can model many different kinds of information. Below is a small list of possible applications:
A spreadsheet where a vertex represents each cell and an edge for where one cell's formula uses another cell's value.
The milestones and activities of largescale projects where a topological ordering can help optimize the projects' schedule to use as little time as possible.
Collections of events and their influence on each other like family trees or version histories.
It is also notable that git uses a DAG to represent commits. A commit can have a child commit, or more than one child commit (in a branch). A child could come from one parent commit or two (in the case of a merge). But there’s no way to go back and form a repeating loop in the git commit hierarchy.
Before you draw graphs on your own, let's draw some graphs together. For each graph, we will have a description.
Draw an undirected graph of 8 verts.
Remember, from our definitions above that an undirected graph has bidirectional edges. So, we can draw eight verts and then connect them with solid lines (not arrows) anyway we see fit.
Draw a directed graph of 7 verts.
A directed graph has at least one edge that is not bidirectional. So, again, we can draw our seven verts and then connect them with edges. This time, we need to make sure that one of the edges is an arrow pointing in only one direction.
Draw a cyclic directed graph of 5 verts.
This drawing will be similar to one for Exercise 2 because it is a directed graph. However, in this graph, we also need to ensure that it has at least one cycle. Remember that a cycle is when you can follow the graph's edges and arrive at a vertex that you've already visited.
To draw this graph, we will draw our five verts and then draw our edges, making sure that we create at least one cycle.
Draw a directed acyclic graph (DAG) of 8 verts.
Again, this graph will be directed. The difference is that it will be acyclic—we can order a DAG’s vertices linearly so that every edge is directed from earlier to later in the sequence.
For this graph, we will draw our eight verts in a line from left to right. We will then draw our edges, making sure that the edges always point from left to right (earlier to later in the sequence).
Draw one graph for each of the descriptions below:
Undirected graph of 4 verts.
Directed graph of 5 verts.
Cyclic directed graph of 6 verts.
DAG of 7 verts.
Two common ways to represent graphs in code are adjacency lists and adjacency matrices. Both of these options have strengths and weaknesses. When deciding on a graph implementation, it's essential to understand what type of data you will store and what operations you need to run on the graph.
Below is an example of how we would represent a graph with an adjacency matrix and an adjacency list. Notice how we represent the relationship between verts C and D when using each type.
Adjacency List
In an adjacency list, the graph stores a list of vertices. For each vertex, it holds a list of each connected vertex.
Below is a representation of the graph above in Python:
Notice that this adjacency list doesn't use any lists. The vertices
collection is a dictionary
which lets us access each collection of edges in O(1) constant time. Because a set
contains the edges, we can check for edges in O(1) constant time.
Adjacency Matrix
Here is the representation of the graph above in an adjacency matrix:
We represent this matrix as a two-dimensional array–a list of lists. With this implementation, we get the benefit of built-in edge weights. 0
denotes no relationship, but any other value represents an edge label or edge weight. The drawback is that we do not have a built-in association between the vertex values and their index.
In practice, implementing both the adjacency list and adjacency matrix would contain more information by including Vertex
and Edge
classes.
Adjacency matrices and adjacency lists have strengths and weaknesses. Let's explore their tradeoffs by comparing their attributes and the efficiency of operations.
In all the following examples, we are using the following shorthand to denote the graph's properties:
Shorthand
Property
V
Total number of vertices in the graph
E
Total number of edges in the graph
e
Average number of edges per vertex
Space Complexity
Adjacency Matrix
Complexity: O(V^2)
space
Consider a dense graph where each vertex points to each other vertex. Here, the total number of edges will approach V^2. This fact means that regardless of whether you choose an adjacency list or an adjacency matrix, both will have a comparable space complexity. However, dictionaries and sets are less space-efficient than lists. So, for dense graphs (graphs with a high average number of edges per vertex), the adjacency matrix is more efficient because it uses lists instead of dictionaries and sets.
Adjacency List
Complexity: O(V+E)
space
Consider a sparse graph with 100 vertices and only one edge. An adjacency list would have to store all 100 vertices but only needs to keep track of that single edge. The adjacency matrix would need to store 100x100=10,000 connections, even though all but one would be 0.
Takeaway: The worst-case storage of an adjacency list occurs when the graph is dense. The matrix and list representation have the same complexity (O(V^2)
). However, for the general case, the list representation is usually more desirable. Also, since finding a vertex's neighbors is a common task, and adjacency lists make this operation more straightforward, it is most common to use adjacency lists to represent graphs.
Add Vertex
Adjacency Matrix
Complexity: O(V)
time
For an adjacency matrix, we would need to add a new value to the end of each existing row and add a new row.
for v in self.edges: self.edges[v].append(0) v.append([0] * len(self.edges + 1))
Remember that with Python lists, appending to the end of a list is O(1)
because of over-allocation of memory but can be O(n)
when the over-allocated memory fills up. When this occurs, adding the vertex can be O(V^2)
.
Adjacency List
Complexity: O(1)
time
Adding a vertex is simple in an adjacency list:
Adding a new key to a dictionary is a constant-time operation.
Takeaway: Adding vertices is very inefficient for adjacency matrices but very efficient for adjacency lists.
Remove Vertex
Adjacency Matrix
Complexity: O(V^2)
Removing vertices is inefficient in both representations. In an adjacency matrix, we need to remove the removed vertex's row and then remove that column from each row. Removing an element from a list requires moving everything after that element over by one slot, which takes an average of V/2
operations. Since we need to do that for every single row in our matrix, that results in V^2
time complexity. We need to reduce each vertex index after our removed index by one as well, which doesn't add to our quadratic time complexity but adds extra operations.
Adjacency List
Complexity: O(V)
We need to visit each vertex for an adjacency list and remove all edges pointing to our removed vertex. Removing elements from sets and dictionaries is an O(1)
operation, resulting in an overall O(V)
time complexity.
Takeaway: Removing vertices is inefficient in both adjacency matrices and lists but more efficient in lists.
Add Edge
Adjacency Matrix
Complexity: O(1)
Adding an edge in an adjacency matrix is simple:
Adjacency List
Complexity: O(1)
Adding an edge in an adjacency list is simple:
Both are constant-time operations.
Takeaway: Adding edges to both adjacency matrices and lists is very efficient.
Remove Edge
Adjacency Matrix
Complexity: O(1)
Removing an edge from an adjacency matrix is simple:
Adjacency List
Complexity: O(1)
Removing an edge from an adjacency list is simple:
Both are constant-time operations.
Takeaway: Removing edges from both adjacency matrices and lists is very efficient.
Find Edge
Adjacency Matrix
Complexity: O(1)
Finding an edge in an adjacency matrix is simple:
Adjacency List
Complexity: O(1)
Finding an edge in an adjacency list is simple:
Both are constant-time operations.
Takeaway: Finding edges in both adjacency matrices and lists is very efficient.
Get All Edges from Vertex
You can use several commands if you want to know all the edges originating from a particular vertex.
Adjacency Matrix
Complexity: O(V)
In an adjacency matrix, this is complicated. You would need to iterate through the entire row and populate a list based on the results:
Adjacency List
Complexity: O(1)
With an adjacency list, this is as simple as returning the value from the vertex dictionary:
Takeaway: Fetching all edges is less efficient in an adjacency matrix than an adjacency list.
Summary
Let's summarize all this complexity information in a table:
type
Space
Add Vert
Remove Vert
Add Edge
Remove Edge
Find Edge
Get All Edges
Matrix
O(V^2)
O(V)
O(V^2)
O(1)
O(1)
O(1)
O(V)
List
O(V+E)
O(1)
O(V)
O(1)
O(1)
O(1)
O(1)
In most practical use-cases, an adjacency list will be a better choice for representing a graph. However, it is also crucial that you be familiar with the matrix representation. Why? Because there are some dense graphs or weighted graphs that could have better space efficiency when represented by a matrix.
Together, we will now use the graph shown in the picture above and represent it in both an adjacency list and an adjacency matrix.
First, the adjacency list:
The difference between this implementation and the previous adjacency list is that this representation allows our edges to have weights.
Now, we need to implement an adjacency matrix. Remember, that one benefit of the matrix is how easy it is to represent edge weights:
Using the graph shown in the picture above, write python code to represent the graph in an adjacency list.
Using the same graph you used for the first exercise, write python code to represent the graph in an adjacency matrix.
Write a paragraph that compares and contrasts the efficiency of your different representations.
We will now use dictionaries to implement the graph abstract data type in Python. We need to have two classes. First, the Graph
class that will keep track of the vertices in the graph instance. Second, the Vertex
class, which we will use to represent each vertex contained in a graph. Both classes will have methods that allow you to complete the basic operations for interacting with graphs and vertices.
Vertex
ClassLet's start by defining a Vertex
class and defining its initialization method (__init__
) and its __str__
method so we can print out a human-readable string representations of each vertex:
The next thing we need for our Vertex
class is a way to other vertices that are connected and the weight
of the connection edge. We will call this method add_connection
.
Let's now add three methods that allow us to get data out of our Vertex
instance objects. These three methods will be get_connections
(retrieves all currently connected vertices), get_value
(retrieves the value of the vertex instance), and get_weight
(gets the edge weight from the vertex to a specified connected vertex).
We've finished our Vertex
class. Now, let's work on our Graph
class.
Graph
ClassOur graph class's primary purpose is to be a way that we can map vertex names to specific vertex objects. We also want to keep track of the number of vertices that our graph contains using a count
property. We will do so using a dictionary. Let's start by defining an initialization method (__init__
).
Next, we need a way to add vertices to our graph. Let's define an add_vertex
method.
We also need a way to add an edge to our graph. We need a method that can create a connection between two vertices and specify the edge's weight. Let's do so by defined an add_edge
method.
Next, we need a way to retrieve a list of all the vertices in our graph. We will define a method called get_vertices
.
Last, we will override a few built-in methods (__contains__
and __iter__
) that are available on objects to make sure they work correctly with Graph
instance objects.
Let's go ahead and test our class definitions and build up a graph structure in a Python interactive environment.
Load the Vertex
class and Graph
class into an interactive Python environment and use the classes to create an instance of the graph shown below.