##### A Model For Parsing Data

Learning to code comes with a lot of highs and lows. We experience them, as developers, all the...

In this blog we will be covering a few of the main principles surrounding the graph data structure. Graphs are nonlinear data structures composed of nodes connected to each other by edges and they’re often used to model many real-life applications (social networks like Facebook, GPS navigation services like Google Maps, and even the World Wide Web).

But before that, what exactly are data structures? Data structures are simply arrangements to store and organize data for efficient manipulation and retrieval. Most of you have probably worked with or heard of at least one of the following linear data structures:

- Arrays
- Stacks
- Queues
- Linked Lists

Linear data structures are simple to understand and great for arranging data in a sequential manner where the elements are added in order. But what if an application required us to model a complex social network and store information about each user and his/her relationships? This is where graphs shine!

WHAT ARE GRAPHS?

A graph is a nonlinear data structure composed of objects (called **nodes** or **vertices**) connected to each other by **edges** that can be used to model many real-life applications:

- Social Networks (Facebook, Instagram, and LinkedIn)
- GPS navigation (Google Maps)
- World Wide Web (PageRank based on directed links between web pages)

Nodes (or vertices) represent the individual members of the network (e.g. users on Facebook or locations/addresses in GPS) and edges represent the connections between the nodes (e.g. friendships on Facebook or roads between your location and the destination on Google Maps).

Fun fact: In graph theory, you may have seen graphs written as G = (V, E), which is just a fancy way of saying that a graph is a collection of vertices and edges.

TYPES OF GRAPHS

Directed vs Undirected

Graphs can be **directed** (digraphs) or **undirected** which describe whether their edges have a direction associated with them. A directed edge indicates a one-way relationship, while an undirected edge indicates a two-way relationship between nodes.

For example, we may use a directed graph to model Instagram followers: Alice can follow Bob, but Bob does not necessarily have to follow back.

On the other hand, friends on Facebook may be modeled as an undirected graph since the relationship is mutual: Alice is Bob’s friend which also means Bob is Alice’s friend.

Technically, graphs can have both directed and undirected edges, but for simplicity, we will be covering graphs that are exclusively one type. Also note that undirected edges are drawn without arrows despite being bidirectional in nature. If we had arrows in both directions in the undirected edge, it would indicate we have two directed edges (one in each direction) as opposed to a single undirected edge.

Fun fact: A directed edge from node A to node B is represented by an ordered pair (A, B), while an undirected edge between node A and node B is represented by an unordered pair {A, B}. Note how the pairs are wrapped: ordered in parentheses and unordered (set) in curly braces.

Weighted vs Unweighted

Graphs can also be **weighted** or **unweighted** which describes whether their edges have a value associated with them. Weights can vary depending on the graph use case (e.g. lengths, costs, time, favorability, etc). To illustrate this, take the following intercity road network as an example. Note that each node is a city and each (undirected) edge is a bi-directional road connecting two cities.

If we wanted to find the shortest path (a common graph problem) between City A and City C, what path should we take? There are only two distinct possible paths we can take from City A:

Since we aren’t provided any information about the weights for each edge, we can assume that each edge is weighted evenly in this unweighted graph and so we should opt for Path 2 (traverse one edge) over Path 1 (traverse two edges).

But not all edges are built the same. In the real world, roads will have different lengths. We can model this by creating a weighted graph and assigning each edge a weight (in miles).

Revisiting the shortest path problem from City A to City C, instead of simply summing up the number of edges to calculate the distances of the paths, we now have to take into account the weights of the edges. In the new weighted graph, we should opt for Path 1 (traverse 125 miles) over Path 2 (traverse 150 miles).

Dense vs Sparse

Graphs can be **dense** or **sparse** which describes the number of edges present in the graph relative to the maximum possible number of edges. There is no clear standard cutoff between dense and sparse graphs and it depends on the context, but to put it simply, if we have many edges we have a dense graph and if we have very few edges we have a sparse graph.

How do we compute the maximum possible number of edges in a graph? We would need to create an edge from each node to each of the other nodes.

This is much simpler to illustrate with an example. Let’s consider a social network like Instagram of only 100 people. Since we are modeling the concept of followers, we should use a directed graph. Each person can follow up to a maximum of 99 other people. Since we have 100 people in our network, then the maximum number of edges in our graph is 100*99 = 9,900. On the other hand, for an undirected graph, the maximum number of edges in our graph would be half that amount (or 4,500) to avoid double-counting pairwise relationships (e.g. if Alice is Bob’s friend, we do not want to redundantly consider again that Bob is Alice’s friend).

REPRESENTING GRAPHS

The two most common ways to represent graphs in practice are:

- Adjacency Matrix
- Adjacency List

For simplicity, we will be using the following undirected and unweighted graph:

We won’t be implementing any code in this section, but we will provide a conceptual foundation enough to get you started on your own.

Adjacency Matrix (better for dense graphs)

Given n nodes in a graph, an adjacency matrix is a two-dimensional n x n array whose indices are mapped to each node. If we call the adjacency matrix A, then the value stored at A[i][j] should be 1 if an edge is present between node i and node j, and 0 otherwise.

For example, A[0][1] = 0 because an edge does not exist between node 0 and node 1. On the other hand, A[0][2] = 1 because an edge exists between node 0 and node 2.

Since our graph is unweighted, note that the values stored in the adjacency matrix are currently binary to indicate whether an edge is present/absent between two nodes. If we decided to add weights to our edges in the current graph, we would replace our values in our adjacency matrix with the numeric weights instead and use an arbitrary special value (like zero or infinity) to indicate no edge.

One advantage of using an adjacency matrix is its constant lookup to determine whether two nodes in the graph are connected. The disadvantage, however, is that we have to store an n x n matrix even for graphs that do not contain that many edges. In other words, the more 0s (or absence of edges) in our matrix, the more wasteful space we are using in our adjacency matrix.

Fun fact: Adjacency matrices for undirected graphs are symmetric, meaning A[i][j] = A[j][i].

Adjacency List (better for sparse graphs)

Given n nodes in a graph, an adjacency list is a size n array of linked lists. If we call the adjacency list A, then the linked list at A[i] stores all the destination nodes of node i. See adjacency list below where the indices of the array are represented in green.

For sparse graphs, adjacency lists are more memory efficient than adjacency matrix since we do not unnecessarily store information about every absence of edges between nodes. However, determining whether two nodes are connected in an adjacency list will have a time complexity of O(n) where n is the number of nodes since we would have to perform at worst case a linear search through all the nodes in the graph.

SUMMARY

Graphs are powerful and flexible data structures that we can use to model complex real-world systems and networks. We covered only a few important concepts about graphs: Directed vs Undirected, Weighted vs Unweighted, Dense vs Sparse, and Adjacency Matrix vs Adjacency List, but there are others that we haven’t yet explored -- Graph Cycles, Breadth-First Search, Depth-First Search, and many more!

Stay tuned and keep an eye out for my future posts!

If you’re interested in learning more about graphs, below are some amazing resources that were referenced for this blog:

Learning to code comes with a lot of highs and lows. We experience them, as developers, all the...