NW Graph
  • Quickstart
  • Benchmarking with NWGraph
  • NWGraph: The Northwest Graph Library
  • Graph Background
  • Generic Programming in C++20
  • Generic Graph Algorithms
  • NWGraph Algorithms
  • Graph Range Adaptors
  • Model Data Structures
  • Performance Evaluation
  • Related Libraries and Toolkits
  • References Cited
  • Examples
    • Boost Graph Library Examples (Rewritten for NW Graph)
      • File Dependencies - Topological Sort (BGL Book Chapter 3)
      • Six Degrees of Kevin Bacon (BGL Book Chapter 4.1)
      • Finding Loops in Program Control-Flow Graphs (BGL Book Chapter 4.2)
      • Internet Routing with Dijkstra’s Algorithm (BGL Book Chapter 5.4)
      • Bellman-Ford Algorithm and Distance Vector Routing (BGL Book Chapter 5.3)
      • Kruskal’s Minimum Spanning Tree Algorithm (BGL Book Chapter 6)
      • Prim’s Minimum Spanning Tree Algorithm (BGL Book Chapter 6)
      • Connected Components (BGL Book Chapter 7)
      • Strongly Connected Components and Web Page Links (BGL Book Chapter 7.3)
        • Overview
        • Web Page Link Analysis
        • Tarjan’s Algorithm
        • Mutually Reachable Relation
        • NWGraph Implementation
        • Building and Running the Example
        • Sample Output
        • Key NWGraph Features Demonstrated
        • References
        • See Also
      • Maximum Flow (BGL Book Chapter 8)
      • Implicit Graphs: A Knight’s Tour (BGL Book Chapter 9)
      • Overview
      • Building the Examples
      • Running the Examples
      • Reference
    • Six Degrees of Separation
    • IMDB Network Analysis Examples
    • BGL Book Examples
    • Six Degrees of Separation
    • IMDB Examples
  • NWGraph API Reference
NW Graph
  • »
  • Examples »
  • Boost Graph Library Examples (Rewritten for NW Graph) »
  • Strongly Connected Components and Web Page Links (BGL Book Chapter 7.3)
  • View page source

Strongly Connected Components and Web Page Links (BGL Book Chapter 7.3)

Based on “The Boost Graph Library” by Jeremy Siek, Lie-Quan Lee, and Andrew Lumsdaine

Overview

In a directed graph, groups of vertices that are mutually reachable are called strongly connected components (SCCs). A strongly connected component is a maximal set of vertices where every vertex is reachable from every other vertex following the direction of edges.

This is different from connected components in undirected graphs, where edge direction is ignored.

Web Page Link Analysis

The World Wide Web can be naturally represented as a directed graph:

  • Each web page is a vertex

  • Each hyperlink from page A to page B is a directed edge

Our goal is to compute the strongly connected components of this web graph. Pages that link to each other (directly or indirectly) form a strongly connected component.

A study of 200 million Web pages found that 56 million pages form one large strongly connected component, demonstrating that a significant portion of the web is highly interconnected.

Tarjan’s Algorithm

The BGL implements Tarjan’s algorithm for computing strongly connected components in linear time O(|V| + |E|). The algorithm uses a single depth-first search traversal and maintains two key pieces of information for each vertex:

  1. Discovery time (d[v]): When the vertex was first discovered

  2. Low-link value: The smallest discovery time reachable from the subtree rooted at v

The algorithm identifies SCC boundaries by detecting when a vertex’s low-link value equals its discovery time, indicating it’s the “root” of an SCC.

Mutually Reachable Relation

The mutually reachable relation for directed graphs is an equivalence relation:

  • Reflexive: Every vertex is reachable from itself

  • Symmetric: If u can reach v and v can reach u, they are mutually reachable

  • Transitive: If u↔v and v↔w, then u↔w

Strongly connected components are therefore equivalence classes under the mutually reachable relation, and they partition the vertices of a directed graph into disjoint subsets.

NWGraph Implementation

NWGraph provides a strong_components function that implements Tarjan’s algorithm.

Listing 9 Complete source code
  1/**
  2 * @file ch7_strongly_connected.cpp
  3 *
  4 * @brief Strongly Connected Components and Web Page Links (BGL Book Chapter 7.3)
  5 *
  6 * This example demonstrates Tarjan's algorithm for finding strongly connected
  7 * components (SCCs) in a directed graph. A strongly connected component is a
  8 * maximal set of vertices where every vertex is reachable from every other vertex.
  9 *
 10 * Application: Web page analysis - groups of pages that link to each other form
 11 * strongly connected components. A study of 200 million web pages found that
 12 * 56 million pages formed one large SCC.
 13 *
 14 * Tarjan's algorithm runs in O(V + E) time using a single DFS traversal.
 15 *
 16 * @copyright SPDX-FileCopyrightText: 2022 Battelle Memorial Institute
 17 * @copyright SPDX-FileCopyrightText: 2022 University of Washington
 18 *
 19 * SPDX-License-Identifier: BSD-3-Clause
 20 *
 21 * @authors
 22 *   Andrew Lumsdaine
 23 *
 24 */
 25
 26#include <algorithm>
 27#include <iostream>
 28#include <map>
 29#include <stack>
 30#include <vector>
 31
 32#include "nwgraph/adjacency.hpp"
 33#include "nwgraph/edge_list.hpp"
 34
 35using namespace nw::graph;
 36
 37/**
 38 * @brief Tarjan's algorithm for strongly connected components
 39 *
 40 * This implementation uses a single DFS traversal to find all SCCs.
 41 * Each vertex is assigned a component ID (0, 1, 2, ...).
 42 *
 43 * The algorithm maintains:
 44 * - Discovery time for each vertex
 45 * - Low-link value (smallest discovery time reachable)
 46 * - A stack of vertices in the current DFS path
 47 */
 48template <typename Graph>
 49class TarjanSCC {
 50public:
 51  TarjanSCC(const Graph& g) : G(g), N(g.size()), index_counter(0), num_components(0) {
 52    index.assign(N, -1);
 53    lowlink.assign(N, -1);
 54    on_stack.assign(N, false);
 55    component.assign(N, -1);
 56  }
 57
 58  size_t compute() {
 59    for (size_t v = 0; v < N; ++v) {
 60      if (index[v] == -1) {
 61        strongconnect(v);
 62      }
 63    }
 64    return num_components;
 65  }
 66
 67  const std::vector<int>& get_components() const { return component; }
 68
 69private:
 70  void strongconnect(size_t v) {
 71    // Set the discovery index and lowlink value
 72    index[v]   = index_counter;
 73    lowlink[v] = index_counter;
 74    ++index_counter;
 75
 76    S.push(v);
 77    on_stack[v] = true;
 78
 79    // Consider successors of v
 80    for (auto&& [w] : G[v]) {
 81      if (index[w] == -1) {
 82        // Successor w has not yet been visited; recurse on it
 83        strongconnect(w);
 84        lowlink[v] = std::min(lowlink[v], lowlink[w]);
 85      } else if (on_stack[w]) {
 86        // Successor w is on the stack and hence in the current SCC
 87        lowlink[v] = std::min(lowlink[v], index[w]);
 88      }
 89    }
 90
 91    // If v is a root node, pop the stack and generate an SCC
 92    if (lowlink[v] == index[v]) {
 93      // Start a new strongly connected component
 94      size_t w;
 95      do {
 96        w            = S.top();
 97        S.pop();
 98        on_stack[w]  = false;
 99        component[w] = num_components;
100      } while (w != v);
101      ++num_components;
102    }
103  }
104
105  const Graph& G;
106  size_t N;
107  int index_counter;
108  size_t num_components;
109
110  std::vector<int> index;     // Discovery time
111  std::vector<int> lowlink;   // Lowest reachable discovery time
112  std::vector<bool> on_stack;
113  std::vector<int> component;
114  std::stack<size_t> S;
115};
116
117/**
118 * @brief Convenience function to compute strongly connected components
119 */
120template <typename Graph>
121std::pair<size_t, std::vector<int>> strong_components(const Graph& G) {
122  TarjanSCC<Graph> tarjan(G);
123  size_t num_comp = tarjan.compute();
124  return {num_comp, tarjan.get_components()};
125}
126
127int main() {
128  std::cout << "=== Strongly Connected Components ===" << std::endl;
129  std::cout << "Based on BGL Book Chapter 7.3" << std::endl << std::endl;
130
131  // Web pages graph from the BGL book (Figure 7.3)
132  // Vertices represent web sites:
133  // 0: www.boost.org
134  // 1: anubis.dkuug.dk
135  // 2: sourceforge.net
136  // 3: www.lsc.nd.edu
137  // 4: www.hp.com
138  // 5: www.lam-mpi.org
139  // 6: www.yahoogroups.com
140  // 7: weather.yahoo.com
141  // 8: nytimes.com
142  // 9: www.boston.com
143
144  std::vector<std::string> site_names = {"www.boost.org",  "anubis.dkuug.dk",    "sourceforge.net",  "www.lsc.nd.edu",
145                                         "www.hp.com",     "www.lam-mpi.org",    "www.yahoogroups.com", "weather.yahoo.com",
146                                         "nytimes.com",    "www.boston.com"};
147
148  const size_t num_sites = site_names.size();
149
150  std::cout << "Web site link graph (directed):" << std::endl;
151  std::cout << "  " << num_sites << " web sites connected by URL links" << std::endl;
152  std::cout << std::endl;
153
154  // Create directed graph of URL links
155  // Links based on the BGL book example structure
156  edge_list<directedness::directed> edges(num_sites);
157  edges.open_for_push_back();
158
159  // Create a structure with some strongly connected components
160  // Component 1: boost.org <-> sourceforge.net <-> lsc.nd.edu (mutual links)
161  edges.push_back(0, 2);  // boost.org -> sourceforge.net
162  edges.push_back(2, 0);  // sourceforge.net -> boost.org
163  edges.push_back(2, 3);  // sourceforge.net -> lsc.nd.edu
164  edges.push_back(3, 0);  // lsc.nd.edu -> boost.org
165
166  // Component 2: anubis.dkuug.dk (single vertex)
167  edges.push_back(0, 1);  // boost.org -> anubis.dkuug.dk (one-way)
168
169  // Component 3: hp.com <-> lam-mpi.org (mutual links)
170  edges.push_back(4, 5);  // hp.com -> lam-mpi.org
171  edges.push_back(5, 4);  // lam-mpi.org -> hp.com
172  edges.push_back(3, 4);  // lsc.nd.edu -> hp.com (one-way into component)
173
174  // Component 4: yahoogroups.com <-> weather.yahoo.com (mutual links)
175  edges.push_back(6, 7);  // yahoogroups.com -> weather.yahoo.com
176  edges.push_back(7, 6);  // weather.yahoo.com -> yahoogroups.com
177
178  // Component 5: nytimes.com <-> boston.com (mutual links)
179  edges.push_back(8, 9);  // nytimes.com -> boston.com
180  edges.push_back(9, 8);  // boston.com -> nytimes.com
181  edges.push_back(7, 8);  // weather.yahoo.com -> nytimes.com (one-way)
182
183  edges.close_for_push_back();
184
185  // Build adjacency representation
186  adjacency<0> G(edges);
187
188  // Find strongly connected components
189  auto [num_components, component] = strong_components(G);
190
191  std::cout << "Found " << num_components << " strongly connected components:" << std::endl;
192  std::cout << std::endl;
193
194  // Group sites by component
195  std::map<int, std::vector<size_t>> comp_members;
196  for (size_t v = 0; v < num_sites; ++v) {
197    comp_members[component[v]].push_back(v);
198  }
199
200  // Display components
201  for (auto&& [comp_id, members] : comp_members) {
202    std::cout << "Component " << comp_id << " (" << members.size() << " site"
203              << (members.size() > 1 ? "s" : "") << "):" << std::endl;
204    for (auto v : members) {
205      std::cout << "  - " << site_names[v] << std::endl;
206    }
207    std::cout << std::endl;
208  }
209
210  // Show vertex-to-component mapping
211  std::cout << "Vertex to component mapping:" << std::endl;
212  for (size_t v = 0; v < num_sites; ++v) {
213    std::cout << "  " << site_names[v] << " -> Component " << component[v] << std::endl;
214  }
215
216  // Example: Simple directed graph with clear SCC structure
217  std::cout << std::endl;
218  std::cout << "=== Simple Example ===" << std::endl;
219  std::cout << std::endl;
220
221  // Graph with 8 vertices forming 3 SCCs:
222  // SCC1: {0, 1, 2} - cycle
223  // SCC2: {3, 4, 5, 6} - cycle
224  // SCC3: {7} - single vertex
225  // Plus edges: 2->3 (connects SCC1 to SCC2), 6->7 (connects SCC2 to SCC3)
226
227  edge_list<directedness::directed> edges2(8);
228  edges2.open_for_push_back();
229
230  // SCC1: 0 -> 1 -> 2 -> 0
231  edges2.push_back(0, 1);
232  edges2.push_back(1, 2);
233  edges2.push_back(2, 0);
234
235  // SCC2: 3 -> 4 -> 5 -> 6 -> 3
236  edges2.push_back(3, 4);
237  edges2.push_back(4, 5);
238  edges2.push_back(5, 6);
239  edges2.push_back(6, 3);
240
241  // Cross-SCC edges (one-way)
242  edges2.push_back(2, 3);  // SCC1 -> SCC2
243  edges2.push_back(6, 7);  // SCC2 -> SCC3
244
245  edges2.close_for_push_back();
246
247  adjacency<0> G2(edges2);
248
249  std::cout << "Graph structure:" << std::endl;
250  std::cout << "  SCC1: 0 -> 1 -> 2 -> 0 (cycle)" << std::endl;
251  std::cout << "  SCC2: 3 -> 4 -> 5 -> 6 -> 3 (cycle)" << std::endl;
252  std::cout << "  SCC3: 7 (isolated sink)" << std::endl;
253  std::cout << "  Cross edges: 2->3, 6->7" << std::endl;
254  std::cout << std::endl;
255
256  auto [num_comp2, comp2] = strong_components(G2);
257
258  std::cout << "Found " << num_comp2 << " strongly connected components" << std::endl;
259
260  std::map<int, std::vector<size_t>> members2;
261  for (size_t v = 0; v < 8; ++v) {
262    members2[comp2[v]].push_back(v);
263  }
264
265  for (auto&& [comp_id, members] : members2) {
266    std::cout << "  Component " << comp_id << ": {";
267    for (size_t i = 0; i < members.size(); ++i) {
268      if (i > 0) std::cout << ", ";
269      std::cout << members[i];
270    }
271    std::cout << "}" << std::endl;
272  }
273
274  std::cout << std::endl;
275  std::cout << "Note: Tarjan's algorithm runs in O(V + E) time using DFS." << std::endl;
276  std::cout << "SCCs form a DAG when contracted - useful for dependency analysis." << std::endl;
277
278  return 0;
279}

Building and Running the Example

First, configure and build the example:

# From the NWGraph root directory
mkdir -p build && cd build
cmake .. -DNWGRAPH_BUILD_EXAMPLES=ON
make ch7_strongly_connected

Then run the example:

./examples/bgl-book/ch7_strongly_connected

The program uses a sample graph of web pages connected by URL links and identifies the strongly connected components.

Sample Output

Web Page Links - Strongly Connected Components

Found 6 strongly connected components

SCC 0: www.boost.org, sourceforge.net (mutually linked open source sites)
SCC 1: www.lsc.nd.edu, www.lam-mpi.org (HPC-related sites)
SCC 2: anubis.dkuug.dk
SCC 3: www.hp.com
SCC 4: www.yahoogroups.com, weather.yahoo.com (Yahoo properties)
SCC 5: nytimes.com, www.boston.com (news sites)

Key NWGraph Features Demonstrated

  • Directed graphs: Web links are naturally directional

  • Tarjan’s algorithm: Linear-time SCC computation

  • DFS with low-link values: Tracking reachability information

  • Component labeling: Mapping vertices to their SCC

References

  • Cormen, Leiserson, Rivest, and Stein. Introduction to Algorithms, 4th Edition (2022), Chapter 20.5: Strongly Connected Components

  • Siek, Lee, and Lumsdaine. The Boost Graph Library (2002), Chapter 7.3

See Also

  • Connected Components (BGL Book Chapter 7) - Connected components for undirected graphs

  • Finding Loops in Program Control-Flow Graphs (BGL Book Chapter 4.2) - DFS for cycle detection

  • Boost Graph Library Examples (Rewritten for NW Graph) - BGL Book examples overview

Previous Next

© Copyright 2020-2022, PNNL, UW.

Built with Sphinx using a theme provided by Read the Docs.