IMDB Network Analysis Examples
These examples demonstrate real-world graph analysis using data from the Internet Movie Database (IMDB) and other large-scale networks.
Overview
The IMDB examples show how to:
Process large datasets and build graphs from raw data
Compute shortest paths between entities (actors, papers, etc.)
Analyze network structure and connectivity
Handle multiple related networks
Example Programs
imdb.cpp - Core IMDB Processing
The main IMDB graph loading and processing example. Reads actor-movie relationships and builds the co-star graph.
Key features:
Reading JSON-formatted IMDB data
Building bipartite actor-movie graph
Computing the co-star graph (actors who appeared together)
Shortest path queries
oracle.cpp - Oracle of Bacon
Implementation of the “Oracle of Bacon” - computing the Bacon number (shortest path to Kevin Bacon) for any actor.
Key features:
BFS-based shortest path computation
Path reconstruction with movie names
Interactive query interface
// Query example
path_to_bacon("Tom Hanks");
// Output: Tom Hanks -> Apollo 13 -> Kevin Bacon (Bacon number: 1)
dblp.cpp - DBLP Academic Network
Analysis of the DBLP computer science bibliography network, where authors are connected through co-authorship.
Key features:
Academic collaboration network
“Erdős number” style analysis
Community detection in co-author networks
dns.cpp - DNS Network Topology
Analysis of DNS (Domain Name System) network topology.
Key features:
Network infrastructure analysis
Connectivity and path analysis
Critical node identification
oracle+dblp.cpp - Combined Analysis
Demonstrates analyzing multiple networks together, finding connections across different domains.
Building the Examples
The IMDB examples require additional data files not included in the repository. To build:
mkdir build && cd build
cmake .. -DNWGRAPH_BUILD_EXAMPLES=ON
make
Data Requirements
These examples require external data files:
oracle.json- IMDB movie database in JSON formatdblp.json- DBLP bibliography data
See the examples/imdb/download.sh script for data download instructions.
Performance Considerations
These examples demonstrate NWGraph’s ability to handle large graphs:
IMDB: ~500K actors, millions of edges
DBLP: ~2M authors, millions of co-authorships
Key techniques for large-scale analysis:
Efficient data structures: CSR representation for minimal memory
Parallel algorithms: TBB-based parallelism for BFS
Incremental construction: Building graphs from streaming data
See Also
Six Degrees of Separation - Simpler Six Degrees example
Six Degrees of Kevin Bacon (BGL Book Chapter 4.1) - BGL book version of Bacon numbers