IMDB Network Analysis Examples

These examples demonstrate real-world graph analysis using data from the Internet Movie Database (IMDB) and other large-scale networks.

Overview

The IMDB examples show how to:

  • Process large datasets and build graphs from raw data

  • Compute shortest paths between entities (actors, papers, etc.)

  • Analyze network structure and connectivity

  • Handle multiple related networks

Example Programs

imdb.cpp - Core IMDB Processing

The main IMDB graph loading and processing example. Reads actor-movie relationships and builds the co-star graph.

Key features:

  • Reading JSON-formatted IMDB data

  • Building bipartite actor-movie graph

  • Computing the co-star graph (actors who appeared together)

  • Shortest path queries

oracle.cpp - Oracle of Bacon

Implementation of the “Oracle of Bacon” - computing the Bacon number (shortest path to Kevin Bacon) for any actor.

Key features:

  • BFS-based shortest path computation

  • Path reconstruction with movie names

  • Interactive query interface

// Query example
path_to_bacon("Tom Hanks");
// Output: Tom Hanks -> Apollo 13 -> Kevin Bacon (Bacon number: 1)

dblp.cpp - DBLP Academic Network

Analysis of the DBLP computer science bibliography network, where authors are connected through co-authorship.

Key features:

  • Academic collaboration network

  • “Erdős number” style analysis

  • Community detection in co-author networks

dns.cpp - DNS Network Topology

Analysis of DNS (Domain Name System) network topology.

Key features:

  • Network infrastructure analysis

  • Connectivity and path analysis

  • Critical node identification

oracle+dblp.cpp - Combined Analysis

Demonstrates analyzing multiple networks together, finding connections across different domains.

Building the Examples

The IMDB examples require additional data files not included in the repository. To build:

mkdir build && cd build
cmake .. -DNWGRAPH_BUILD_EXAMPLES=ON
make

Data Requirements

These examples require external data files:

  • oracle.json - IMDB movie database in JSON format

  • dblp.json - DBLP bibliography data

See the examples/imdb/download.sh script for data download instructions.

Performance Considerations

These examples demonstrate NWGraph’s ability to handle large graphs:

  • IMDB: ~500K actors, millions of edges

  • DBLP: ~2M authors, millions of co-authorships

Key techniques for large-scale analysis:

  1. Efficient data structures: CSR representation for minimal memory

  2. Parallel algorithms: TBB-based parallelism for BFS

  3. Incremental construction: Building graphs from streaming data

See Also