Strongly Connected Components and Web Page Links (BGL Book Chapter 7.3)
Based on “The Boost Graph Library” by Jeremy Siek, Lie-Quan Lee, and Andrew Lumsdaine
Overview
In a directed graph, groups of vertices that are mutually reachable are called strongly connected components (SCCs). A strongly connected component is a maximal set of vertices where every vertex is reachable from every other vertex following the direction of edges.
This is different from connected components in undirected graphs, where edge direction is ignored.
Web Page Link Analysis
The World Wide Web can be naturally represented as a directed graph:
Each web page is a vertex
Each hyperlink from page A to page B is a directed edge
Our goal is to compute the strongly connected components of this web graph. Pages that link to each other (directly or indirectly) form a strongly connected component.
A study of 200 million Web pages found that 56 million pages form one large strongly connected component, demonstrating that a significant portion of the web is highly interconnected.
Tarjan’s Algorithm
The BGL implements Tarjan’s algorithm for computing strongly connected components in linear time O(|V| + |E|). The algorithm uses a single depth-first search traversal and maintains two key pieces of information for each vertex:
Discovery time (d[v]): When the vertex was first discovered
Low-link value: The smallest discovery time reachable from the subtree rooted at v
The algorithm identifies SCC boundaries by detecting when a vertex’s low-link value equals its discovery time, indicating it’s the “root” of an SCC.
Mutually Reachable Relation
The mutually reachable relation for directed graphs is an equivalence relation:
Reflexive: Every vertex is reachable from itself
Symmetric: If u can reach v and v can reach u, they are mutually reachable
Transitive: If u↔v and v↔w, then u↔w
Strongly connected components are therefore equivalence classes under the mutually reachable relation, and they partition the vertices of a directed graph into disjoint subsets.
NWGraph Implementation
NWGraph provides a strong_components function that implements Tarjan’s algorithm.
1/**
2 * @file ch7_strongly_connected.cpp
3 *
4 * @brief Strongly Connected Components and Web Page Links (BGL Book Chapter 7.3)
5 *
6 * This example demonstrates Tarjan's algorithm for finding strongly connected
7 * components (SCCs) in a directed graph. A strongly connected component is a
8 * maximal set of vertices where every vertex is reachable from every other vertex.
9 *
10 * Application: Web page analysis - groups of pages that link to each other form
11 * strongly connected components. A study of 200 million web pages found that
12 * 56 million pages formed one large SCC.
13 *
14 * Tarjan's algorithm runs in O(V + E) time using a single DFS traversal.
15 *
16 * @copyright SPDX-FileCopyrightText: 2022 Battelle Memorial Institute
17 * @copyright SPDX-FileCopyrightText: 2022 University of Washington
18 *
19 * SPDX-License-Identifier: BSD-3-Clause
20 *
21 * @authors
22 * Andrew Lumsdaine
23 *
24 */
25
26#include <algorithm>
27#include <iostream>
28#include <map>
29#include <stack>
30#include <vector>
31
32#include "nwgraph/adjacency.hpp"
33#include "nwgraph/edge_list.hpp"
34
35using namespace nw::graph;
36
37/**
38 * @brief Tarjan's algorithm for strongly connected components
39 *
40 * This implementation uses a single DFS traversal to find all SCCs.
41 * Each vertex is assigned a component ID (0, 1, 2, ...).
42 *
43 * The algorithm maintains:
44 * - Discovery time for each vertex
45 * - Low-link value (smallest discovery time reachable)
46 * - A stack of vertices in the current DFS path
47 */
48template <typename Graph>
49class TarjanSCC {
50public:
51 TarjanSCC(const Graph& g) : G(g), N(g.size()), index_counter(0), num_components(0) {
52 index.assign(N, -1);
53 lowlink.assign(N, -1);
54 on_stack.assign(N, false);
55 component.assign(N, -1);
56 }
57
58 size_t compute() {
59 for (size_t v = 0; v < N; ++v) {
60 if (index[v] == -1) {
61 strongconnect(v);
62 }
63 }
64 return num_components;
65 }
66
67 const std::vector<int>& get_components() const { return component; }
68
69private:
70 void strongconnect(size_t v) {
71 // Set the discovery index and lowlink value
72 index[v] = index_counter;
73 lowlink[v] = index_counter;
74 ++index_counter;
75
76 S.push(v);
77 on_stack[v] = true;
78
79 // Consider successors of v
80 for (auto&& [w] : G[v]) {
81 if (index[w] == -1) {
82 // Successor w has not yet been visited; recurse on it
83 strongconnect(w);
84 lowlink[v] = std::min(lowlink[v], lowlink[w]);
85 } else if (on_stack[w]) {
86 // Successor w is on the stack and hence in the current SCC
87 lowlink[v] = std::min(lowlink[v], index[w]);
88 }
89 }
90
91 // If v is a root node, pop the stack and generate an SCC
92 if (lowlink[v] == index[v]) {
93 // Start a new strongly connected component
94 size_t w;
95 do {
96 w = S.top();
97 S.pop();
98 on_stack[w] = false;
99 component[w] = num_components;
100 } while (w != v);
101 ++num_components;
102 }
103 }
104
105 const Graph& G;
106 size_t N;
107 int index_counter;
108 size_t num_components;
109
110 std::vector<int> index; // Discovery time
111 std::vector<int> lowlink; // Lowest reachable discovery time
112 std::vector<bool> on_stack;
113 std::vector<int> component;
114 std::stack<size_t> S;
115};
116
117/**
118 * @brief Convenience function to compute strongly connected components
119 */
120template <typename Graph>
121std::pair<size_t, std::vector<int>> strong_components(const Graph& G) {
122 TarjanSCC<Graph> tarjan(G);
123 size_t num_comp = tarjan.compute();
124 return {num_comp, tarjan.get_components()};
125}
126
127int main() {
128 std::cout << "=== Strongly Connected Components ===" << std::endl;
129 std::cout << "Based on BGL Book Chapter 7.3" << std::endl << std::endl;
130
131 // Web pages graph from the BGL book (Figure 7.3)
132 // Vertices represent web sites:
133 // 0: www.boost.org
134 // 1: anubis.dkuug.dk
135 // 2: sourceforge.net
136 // 3: www.lsc.nd.edu
137 // 4: www.hp.com
138 // 5: www.lam-mpi.org
139 // 6: www.yahoogroups.com
140 // 7: weather.yahoo.com
141 // 8: nytimes.com
142 // 9: www.boston.com
143
144 std::vector<std::string> site_names = {"www.boost.org", "anubis.dkuug.dk", "sourceforge.net", "www.lsc.nd.edu",
145 "www.hp.com", "www.lam-mpi.org", "www.yahoogroups.com", "weather.yahoo.com",
146 "nytimes.com", "www.boston.com"};
147
148 const size_t num_sites = site_names.size();
149
150 std::cout << "Web site link graph (directed):" << std::endl;
151 std::cout << " " << num_sites << " web sites connected by URL links" << std::endl;
152 std::cout << std::endl;
153
154 // Create directed graph of URL links
155 // Links based on the BGL book example structure
156 edge_list<directedness::directed> edges(num_sites);
157 edges.open_for_push_back();
158
159 // Create a structure with some strongly connected components
160 // Component 1: boost.org <-> sourceforge.net <-> lsc.nd.edu (mutual links)
161 edges.push_back(0, 2); // boost.org -> sourceforge.net
162 edges.push_back(2, 0); // sourceforge.net -> boost.org
163 edges.push_back(2, 3); // sourceforge.net -> lsc.nd.edu
164 edges.push_back(3, 0); // lsc.nd.edu -> boost.org
165
166 // Component 2: anubis.dkuug.dk (single vertex)
167 edges.push_back(0, 1); // boost.org -> anubis.dkuug.dk (one-way)
168
169 // Component 3: hp.com <-> lam-mpi.org (mutual links)
170 edges.push_back(4, 5); // hp.com -> lam-mpi.org
171 edges.push_back(5, 4); // lam-mpi.org -> hp.com
172 edges.push_back(3, 4); // lsc.nd.edu -> hp.com (one-way into component)
173
174 // Component 4: yahoogroups.com <-> weather.yahoo.com (mutual links)
175 edges.push_back(6, 7); // yahoogroups.com -> weather.yahoo.com
176 edges.push_back(7, 6); // weather.yahoo.com -> yahoogroups.com
177
178 // Component 5: nytimes.com <-> boston.com (mutual links)
179 edges.push_back(8, 9); // nytimes.com -> boston.com
180 edges.push_back(9, 8); // boston.com -> nytimes.com
181 edges.push_back(7, 8); // weather.yahoo.com -> nytimes.com (one-way)
182
183 edges.close_for_push_back();
184
185 // Build adjacency representation
186 adjacency<0> G(edges);
187
188 // Find strongly connected components
189 auto [num_components, component] = strong_components(G);
190
191 std::cout << "Found " << num_components << " strongly connected components:" << std::endl;
192 std::cout << std::endl;
193
194 // Group sites by component
195 std::map<int, std::vector<size_t>> comp_members;
196 for (size_t v = 0; v < num_sites; ++v) {
197 comp_members[component[v]].push_back(v);
198 }
199
200 // Display components
201 for (auto&& [comp_id, members] : comp_members) {
202 std::cout << "Component " << comp_id << " (" << members.size() << " site"
203 << (members.size() > 1 ? "s" : "") << "):" << std::endl;
204 for (auto v : members) {
205 std::cout << " - " << site_names[v] << std::endl;
206 }
207 std::cout << std::endl;
208 }
209
210 // Show vertex-to-component mapping
211 std::cout << "Vertex to component mapping:" << std::endl;
212 for (size_t v = 0; v < num_sites; ++v) {
213 std::cout << " " << site_names[v] << " -> Component " << component[v] << std::endl;
214 }
215
216 // Example: Simple directed graph with clear SCC structure
217 std::cout << std::endl;
218 std::cout << "=== Simple Example ===" << std::endl;
219 std::cout << std::endl;
220
221 // Graph with 8 vertices forming 3 SCCs:
222 // SCC1: {0, 1, 2} - cycle
223 // SCC2: {3, 4, 5, 6} - cycle
224 // SCC3: {7} - single vertex
225 // Plus edges: 2->3 (connects SCC1 to SCC2), 6->7 (connects SCC2 to SCC3)
226
227 edge_list<directedness::directed> edges2(8);
228 edges2.open_for_push_back();
229
230 // SCC1: 0 -> 1 -> 2 -> 0
231 edges2.push_back(0, 1);
232 edges2.push_back(1, 2);
233 edges2.push_back(2, 0);
234
235 // SCC2: 3 -> 4 -> 5 -> 6 -> 3
236 edges2.push_back(3, 4);
237 edges2.push_back(4, 5);
238 edges2.push_back(5, 6);
239 edges2.push_back(6, 3);
240
241 // Cross-SCC edges (one-way)
242 edges2.push_back(2, 3); // SCC1 -> SCC2
243 edges2.push_back(6, 7); // SCC2 -> SCC3
244
245 edges2.close_for_push_back();
246
247 adjacency<0> G2(edges2);
248
249 std::cout << "Graph structure:" << std::endl;
250 std::cout << " SCC1: 0 -> 1 -> 2 -> 0 (cycle)" << std::endl;
251 std::cout << " SCC2: 3 -> 4 -> 5 -> 6 -> 3 (cycle)" << std::endl;
252 std::cout << " SCC3: 7 (isolated sink)" << std::endl;
253 std::cout << " Cross edges: 2->3, 6->7" << std::endl;
254 std::cout << std::endl;
255
256 auto [num_comp2, comp2] = strong_components(G2);
257
258 std::cout << "Found " << num_comp2 << " strongly connected components" << std::endl;
259
260 std::map<int, std::vector<size_t>> members2;
261 for (size_t v = 0; v < 8; ++v) {
262 members2[comp2[v]].push_back(v);
263 }
264
265 for (auto&& [comp_id, members] : members2) {
266 std::cout << " Component " << comp_id << ": {";
267 for (size_t i = 0; i < members.size(); ++i) {
268 if (i > 0) std::cout << ", ";
269 std::cout << members[i];
270 }
271 std::cout << "}" << std::endl;
272 }
273
274 std::cout << std::endl;
275 std::cout << "Note: Tarjan's algorithm runs in O(V + E) time using DFS." << std::endl;
276 std::cout << "SCCs form a DAG when contracted - useful for dependency analysis." << std::endl;
277
278 return 0;
279}
Building and Running the Example
First, configure and build the example:
# From the NWGraph root directory
mkdir -p build && cd build
cmake .. -DNWGRAPH_BUILD_EXAMPLES=ON
make ch7_strongly_connected
Then run the example:
./examples/bgl-book/ch7_strongly_connected
The program uses a sample graph of web pages connected by URL links and identifies the strongly connected components.
Sample Output
Web Page Links - Strongly Connected Components
Found 6 strongly connected components
SCC 0: www.boost.org, sourceforge.net (mutually linked open source sites)
SCC 1: www.lsc.nd.edu, www.lam-mpi.org (HPC-related sites)
SCC 2: anubis.dkuug.dk
SCC 3: www.hp.com
SCC 4: www.yahoogroups.com, weather.yahoo.com (Yahoo properties)
SCC 5: nytimes.com, www.boston.com (news sites)
Key NWGraph Features Demonstrated
Directed graphs: Web links are naturally directional
Tarjan’s algorithm: Linear-time SCC computation
DFS with low-link values: Tracking reachability information
Component labeling: Mapping vertices to their SCC
References
Cormen, Leiserson, Rivest, and Stein. Introduction to Algorithms, 4th Edition (2022), Chapter 20.5: Strongly Connected Components
Siek, Lee, and Lumsdaine. The Boost Graph Library (2002), Chapter 7.3
See Also
Connected Components (BGL Book Chapter 7) - Connected components for undirected graphs
Finding Loops in Program Control-Flow Graphs (BGL Book Chapter 4.2) - DFS for cycle detection
Boost Graph Library Examples (Rewritten for NW Graph) - BGL Book examples overview