SciPy CSGraph – Compressed Sparse Graph
Graphs are powerful mathematical structures used to represent relationships between entities in various fields, including computer science, social networks, transportation systems, and more. Analyzing and computing graphs is a fundamental task in many applications, but it can be challenging, especially when dealing with large graphs with sparse connectivity. Fortunately, the scipy.sparse.csgraph subpackage in the SciPy library offers a comprehensive set of tools and algorithms specifically designed for efficient graph analysis using sparse matrix representations. Sparse matrices are matrices where the majority of elements are zero, making them ideal for representing and manipulating large graphs with sparse connectivity.
Note: Before going further strongly recommended to know how to create a sparse matrix in Python (Refer to this article How to Create a Sparse Matrix in Python).
Key Functionalities of SciPy CSGraph
The scipy.sparse.csgraph subpackage offers a wide range of functionalities and algorithms for efficient graph analysis. Let’s delve into its key features:
- Shortest Path Algorithms
- Dijkstra’s Algorithm: Find the shortest path between nodes using the shortest_path function.
- Bellman-Ford Algorithm: Compute the shortest path considering negative edge weights with bellman_ford.
- Floyd-Warshall Algorithm: Determine the shortest path between all pairs of nodes using floyd_warshall.
- Connected Components
- connected_components: Identify the connected components in a graph, providing the number of components and labels for each node.
- connected_components_dist: Compute the connected components considering edge weights.
- Minimum Spanning Tree
- minimum_spanning_tree: Calculate the minimum spanning tree of a graph, finding the subset of edges with the minimum total weight.
- minimum_spanning_tree_csr: Compute the minimum spanning tree for graphs represented as Compressed Sparse Row (CSR) matrices.
- Strongly Connected Components
- strongly_connected_components: Identify strongly connected components in a directed graph.
- strongly_connected_components_csr: Compute strongly connected components for CSR matrix representation.
Creating CSGraph From Adjacency Matrix
- Define an adjacency matrix that represents the connectivity of the graph.
- Convert the adjacency matrix to a sparse matrix representation (e.g., CSR, CSC).
- Use the csgraph_from_dense function to convert the sparse matrix to a graph representation
- The graph is directed.
In this example, we are using the Numpy and Scipy for creating a sparse matrix and then it’s converted into a graph.
Python3
import numpy as np from scipy.sparse import csr_matrix from scipy.sparse.csgraph import csgraph_from_dense # Creating a 3 * 3 sparse matrix . sparseMatrix = csr_matrix(( 3 , 3 ), dtype = np.int8).toarray() # converting sparse matrix to graph graph = csgraph_from_dense(sparseMatrix) print (graph.toarray()) |
Output:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
In this example first, we created the adjacency matrix, then we converted it into a sparse matrix.
Python3
from scipy.sparse import csr_matrix from scipy.sparse.csgraph import csgraph_from_dense # Define the adjacency matrix for a directed graph adjacency_matrix = [[ 0 , 1 , 0 , 1 ], [ 0 , 0 , 1 , 0 ], [ 0 , 0 , 0 , 1 ], [ 0 , 0 , 0 , 0 ]] # Convert the adjacency matrix to CSR format graph_sparse = csr_matrix(adjacency_matrix).toarray() # Convert CSR format to graph representation graph = csgraph_from_dense(graph_sparse) # it will print graph as=> (source,destination) edge-weight print (graph) |
Output:
(0, 1) 1.0
(0, 3) 1.0
(1, 2) 1.0
(2, 3) 1.0
Creating CSGraph from Edge List
- Define an edge list that represents the connectivity of the graph.
- Convert the edge list to a sparse matrix representation (e.g., COO).
- Use the csgraph_from_dense function to convert the sparse matrix to a graph representation
- The graph is directed.
Python3
import numpy as np from scipy.sparse import coo_matrix from scipy.sparse.csgraph import csgraph_from_dense # creating the edge list edgeList = coo_matrix(( 3 , 3 ), dtype = np.int8).toarray() # converting the edge list to graph graph = csgraph_from_dense(edgeList) print (graph.toarray()) |
Output:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
Creating The undirected Graph:
To create an undirected graph using scipy.sparse.csgraph, you can use the symmetric adjacency matrix.
Symmetric Matrix: When we say that a matrix is symmetric, it means that the matrix is equal to its transpose. In other words, for a square matrix, if the element at row i and column j is equal to the element at row j and column i, then the matrix is symmetric.
Python3
from scipy.sparse import csr_matrix from scipy.sparse.csgraph import csgraph_from_dense # Define the adjacency matrix for an undirected graph # Here 1 represents the edge weight between source to destination adjacency_matrix = [[ 0 , 1 , 0 , 1 ], [ 1 , 0 , 1 , 0 ], [ 0 , 1 , 0 , 1 ], [ 1 , 0 , 1 , 0 ]] # Set the matrix symmetrically adjacency_matrix = [[ max (adjacency_matrix[i][j], adjacency_matrix[j][i]) for j in range ( len (adjacency_matrix))] for i in range ( len (adjacency_matrix))] # Convert the adjacency matrix to CSR format graph_sparse = csr_matrix(adjacency_matrix).toarray() # Convert CSR format to graph representation graph = csgraph_from_dense(graph_sparse) print (graph) |
Output:
(0, 1) 1.0
(0, 3) 1.0
(1, 0) 1.0
(1, 2) 1.0
(2, 1) 1.0
(2, 3) 1.0
(3, 0) 1.0
(3, 2) 1.0
Syntax:
breadth_first_order(csgraph, i_start, directed=True)
Parameters
- csgraph : The N x N array representing the input graph.
- i_start :(int) The index of starting node
Return:
node_array: ndarray(one dimension) The breadth-first list of nodes, starting with specified node. The length of node_array is the number of nodes reachable from the specified node.
Python3
from scipy.sparse import csr_matrix from scipy.sparse.csgraph import breadth_first_order adjMat = [ [ 0 , 1 , 2 , 0 ], [ 0 , 0 , 0 , 1 ], [ 2 , 0 , 0 , 3 ], [ 0 , 0 , 0 , 0 ] ] graph = csr_matrix(adjMat) print (graph) # bfs start from Node 0 bfs = breadth_first_order(graph, 0 , return_predecessors = False ) print ( "Breadth-first travelling order:" , bfs) |
Output:
(0, 1) 1
(0, 2) 2
(1, 3) 1
(2, 0) 2
(2, 3) 3
Breadth-first travelling order: [0 1 2 3]
depth_first_order(csgraph, i_start, directed=True): Return a depth-first ordering starting with the specified node.
Python3
from scipy.sparse.csgraph import depth_first_order # dfs Travel Start from Node 1 dfs = depth_first_order(graph, i_start = 1 , return_predecessors = False ) print ( "Depth First Travelling order:" , dfs) |
Output:
Depth First Travelling order: [1 3]
Syntax:
shortest_path(csgraph, method=’auto’, directed=True,indices=None)
Parameters:
- csgraph : The N x N array of distances representing the input graph.
- method : (string [‘auto’|’FW’|’D’], optional) Algorithm to use for shortest paths. Options are:
- ‘auto’ – (default) select the best among ‘FW’, ‘D’, ‘BF’, or ‘J’
- ‘FW’ – Floyd-Warshall algorithm. Computational cost is
- ‘D’ – Dijkstra’s algorithm with Fibonacci heaps.
- ‘BF’ – Bellman-Ford algorithm. This algorithm can be used
- ‘J’ – Johnson’s algorithm. Like the Bellman-Ford
- directed: (bool, optional):
- If True (default), then find the shortest path on a directed graph:
- If False, then find the shortest path on an undirected graph
- indices : (arrays/int) If specified, only compute the paths from the points at the given indices.
Returns:
- dist_matrixnd : (array)The N x N matrix of distances between graph nodes. dist_matrix[i,j] gives the shortest distance from point i to point j along the graph
Python3
from scipy.sparse.csgraph import shortest_path # the shortest path distance between # the Node 1 to remaning Nodes source = 1 dist1 = shortest_path(csgraph = graph, method = "auto" , directed = False , indices = source) print ( "Distance from Node {source} to remaning Nodes" , dist1) # the shortest path distances between All Nodes dist_matrix = shortest_path(csgraph = graph, method = 'FW' , directed = False ) print ( "Distance between the All the Nodes\n" , dist_matrix) |
Output:
Distance from Node {source} to remaning Nodes [1. 0. 3. 1.]
Distance between the All the Nodes
[[0. 1. 2. 2.]
[1. 0. 3. 1.]
[2. 3. 0. 3.]
[2. 1. 3. 0.]]
- output1: dist1[j] represents the shortest distance between the source node(In example 1) to Node j
- output2: distance[i, j] represents the shortest path between the node i to j.
Syntax:
minimum_spanning_tree(csgraph, overwrite=False)
- A minimum spanning tree is a graph consisting of the subset of edges which together connect all connected nodes, while minimizing the total sum of weights on the edges. This is computed using the Kruskal algorithm
Parameters:
- csgraph : input graph
- overwrite :(bool ,optional) If true, then parts of the input graph will be overwritten for efficiency. Default is False.
Return:
- span_tree :(csr_matrix) The N x N compressed-sparse representation of the undirected minimum spanning tree over the input
Python3
from scipy.sparse import csr_matrix from scipy.sparse.csgraph import minimum_spanning_tree X = csr_matrix([[ 0 , 8 , 0 , 3 ], [ 0 , 0 , 2 , 5 ], [ 0 , 0 , 0 , 6 ], [ 0 , 0 , 0 , 0 ]]) # Finding minimum span tree Tcsr = minimum_spanning_tree(X) # Minimum Span tree print (Tcsr.toarray()) |
Output:
[[0. 0. 0. 3.]
[0. 0. 2. 5.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
Syntax:
maximum_flow(csgraph, source, sink)
Parameters:
- csgraph: input graph
- source : source node
- sink : destination node
Return:
- return instance of MaximumFlowResult class
- The Attributes of the class are flow_value(Max flow to graph) and flow_matrix
Python3
from scipy.sparse import csr_matrix from scipy.sparse.csgraph import maximum_flow # Define the adjacency matrix for a directed graph adjacency_matrix = [[ 0 , 16 , 13 , 0 , 0 , 0 ], [ 0 , 0 , 0 , 12 , 0 , 0 ], [ 0 , 4 , 0 , 0 , 14 , 0 ], [ 0 , 0 , 9 , 0 , 0 , 20 ], [ 0 , 0 , 0 , 7 , 0 , 4 ], [ 0 , 0 , 0 , 0 , 0 , 0 ]] # Convert the adjacency matrix to CSR format graph_sparse = csr_matrix(adjacency_matrix) # Compute the maximum flow in the graph flow_dict = maximum_flow(graph_sparse, 0 , 5 ) # Retrieve the maximum flow value max_flow_value = flow_dict.flow_value # Retrieve the flow distribution along the edges flow_matrix = flow_dict.flow print ( "Maximum Flow Value:" , max_flow_value) print ( "Flow Distribution:" ) print (flow_matrix.toarray()) |
Output:
Maximum Flow Value: 23
Flow Distribution:
[[ 0 12 11 0 0 0]
[-12 0 0 12 0 0]
[-11 0 0 0 11 0]
[ 0 -12 0 0 -7 19]
[ 0 0 -11 7 0 4]
[ 0 0 0 -19 -4 0]]
- The maximum flow value is 23, indicating that a maximum of 23 units of flow can be sent from the source node to the sink node
- The flow distribution matrix shows the flow along each edge. For example, the element flow_matrix[0, 1] represents the flow from node 0 to node 1, which is 12.
Directed v/s Undirected Graph
|
||
Edge Representation |
Edges have a specific direction between Nodes. For example: If you see the output of Example-2 there is a directed edge from 0 to 1, it signifies that we can move from 0 to 1. But we can’t move from 1 to 0. |
In an undirected graph, the edges do not have any specific direction For example: If you see the output of the above example there is an edge from 0 to 1 and also an edge from 1 to 0. It signifies that we can move from 0 to 1 and also we can move from 1 to 0 |
Symmetry |
The adjacency matrix is asymmetric or the Relationship between vertices is asymmetric. The adjacency matrix of example-2 is asymmetric. |
The adjacency matrix is symmetric or the Relationship between vertices is symmetric. The adjacency matrix of the above example is symmetric. |
Edge Notation |
Represented as (source vertex, target vertex). |
Represented as an unordered pair {vertex A, vertex B}. |
|
Flow charts, one-way streets |
Bidirectional streets |
Conclusion:
Throughout this article, we explored the key features and functionalities of scipy.sparse.csgraph. We discussed how to create a graph using different methods such as COO matrix representation and dense matrix conversion. We learned about important graph algorithms like Dijkstra’s algorithm for finding the shortest paths and the maximum flow algorithm for network flow problems.
As you continue exploring the capabilities of scipy.sparse.csgraph, you’ll discover a rich collection of algorithms and methods that can be applied to a wide range of graph-related problems. From graph traversal and connectivity analysis to graph partitioning and network flow optimization, scipy.sparse.csgraph is a versatile tool that opens up a world of possibilities for graph analysis and optimization.
Contact Us