2.8: Analysis of graph algorithms

Last updated
Save as PDF

Page ID: 46583

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Earlier in this chapter I presented an algorithm for checking whether a graph is connected; in the next few chapters, we will see other graph algorithms. Along the way, we will analyze the performance of those algorithms, figuring out how their run times grow as the size of the graphs increases.

If you are not already familiar with analysis of algorithms, you might want to read Appendix B of Think Python, 2nd Edition, at thinkcomplex.com/tp2.

The order of growth for graph algorithms is usually expressed as a function of n, the number of vertices (nodes), and m, the number of edges.

As an example, let’s analyze reachable_nodes from Section 2.5:

def reachable_nodes(G, start): 
    seen = set() 
    stack = [start] 
    while stack: 
        node = stack.pop() 
        if node not in seen: 
            seen.add(node) 
            stack.extend(G.neighbors(node)) 
    return seen

Each time through the loop, we pop a node off the stack; by default, pop removes and returns the last element of a list, which is a constant time operation.

Next we check whether the node is in seen, which is a set, so checking membership is constant time.

If the node is not already in seen, we add it, which is constant time, and then add the neighbors to the stack, which is linear in the number of neighbors.

To express the run time in terms of n and m, we can add up the total number of times each node is added to seen and stack.

Each node is only added to seen once, so the total number of additions is n.

But nodes might be added to stack many times, depending on how many neighbors they have. If a node has k neighbors, it is added to stack k times. Of course, if it has k neighbors, that means it is connected to k edges.

So the total number of additions to stack is the total number of edges, m, doubled because we consider every edge twice.

Therefore, the order of growth for this function is O(n + m), which is a convenient way to say that the run time grows in proportion to either n or m, whichever is bigger.

If we know the relationship between n and m, we can simplify this expression. For example, in a complete graph the number of edges is n(n−1)/2, which is in O(n²). So for a complete graph, reachable_nodes is quadratic in n.