Quickstart
NWGraph is a high-performance header-only generic C++ graph library,
based on C++20 concept
and range
language feature. It consists
of multiple graph algorithms for well-known graph kernels and supporting
data structures. Both sequential and parallel algorithms are available.
Project Organization
The organization of our library is shown as follow:
$NWGraph_HOME/
├── README.md
├── CMakeLists.txt
├── apb/
├── bench/
├── examples/
│ └── imdb/
├── include/
│ └── nwgraph/
│ ├── adaptors/
│ ├── algorithms/
│ ├── containers/
│ ├── experimental/
│ │ └── algorithms/
│ ├── graphs/
│ ├── io/
│ ├── util/
│ ├── graph_concepts.hpp
│ └── ...
├── test/
└── ...
The genericity of different algorithms available in the NWGraph library
stems from a taxonomy of graph concepts. The definition of these
concepts can be found in the include/nwgraph/graph_concepts.hpp
file. The header files containing various sequential and parallel graph
algorithms for well-known graph kernels can be found under the
$NWGraph_HOME/include/nwgraph/algorithms/
directory (some of the
experimental algorithms are located in
the$NWGraph_HOME/include/nwgraph/experimental/algorithms/
subdirectory). The header files for the range adaptors are under
$NWGraph_HOME/include/nwgraph/adaptors/
directory. The code for the
applications is located in the $NWGraph_HOME/bench/
diretory. The
abstraction penalty benchmark for benchmarking different containers and
a variety of different ways to iterate through a graph (including the
use of graphadaptors) are under the $NWGraph_HOME/apb/
directory.
Various examples of how to use NWGraph can be found in the
$NWGraph_HOME/example/imdb/
directory.
How to Compile
NWGraph uses Intel OneTBB as the parallel backend.
Requirements
CMake >= 3.20
g++ >= 11 with support for OneTBB as parallel backend
oneTBB >= 2021
You should be able to install cmake and g++ with your system’s package manager (e.g., apt or homebrew). oneTBB appears to be available on homebrew for MacOS 11.6 and later (and perhaps earlier).
Instructions for installing oneTBB with various Linux package managers can be found here:
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
Installation packages for oneAPI for Linux are available on intel.com:
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#onetbb
Compilation
$ mkdir build; cd build
$ cmake ..
$ make -j4
Once compiled, the drivers of the graph benchmarks can be found under
the $NWGraph_HOME/build/bench/
folder. The binary files of the
abstraction penalty benchmarks are under the
$NWGraph_HOME/build/abp/
folder. The binaries of the IMDB examples
are under the $NWGraph_HOME/build/examples/
folder. The binary files
of the examples to show case the features of NWGraph library are under
the $NWGraph_HOME/build/test/
folder.
Useful things to know
To specify compiler:
$ cmake .. -DCMAKE_CXX_COMPILER=g++-11
To specify build type as Release or Debug, default is Release:
$ cmake .. -DCMAKE_BUILD_TYPE=Release (or Debug)
To enable test cases and examples under build/test directory:
$ cmake .. -DNW_GRAPH_BUILD_TESTS=ON (or OFF)
To generate applications under build/bench/ directory:
$ cmake .. -DNW_GRAPH_BUILD_BENCH=ON (or OFF)
To generate abstraction penalty under build/abp/ directory:
$ cmake .. -DNW_GRAPH_BUILD_APBS=OFF (or ON)
To generate tools under build/example/ directory:
$ cmake .. -DNW_GRAPH_BUILD_EXAMPLES=OFF (or ON)
If cmake is not able to find TBB in its expected places, you may get an
error during the cmake step. In this case, you need to set the
TBBROOT
environment variable to the location where oneTBB was
installed. For example:
$ TBBROOT=/opt/intel/oneapi/tbb/2021.5.1 cmake ..
To see verbose information during compilation:
$ make VERBOSE=1
Running code in NWGraph
NWGraph uses command-line interface description language DOCOPT to define the interface of our command-line applications and abstraction penalty experiments.
A typical interface of a benchmark driver looks like this:
bfs.exe: breadth first search benchmark driver.
Usage:
bfs.exe (-h | --help)
bfs.exe -f FILE [-r NODE | -s FILE] [-i NUM] [-a NUM] [-b NUM] [-B NUM] [-n NUM] [--seed NUM] [--version ID...] [--log FILE] [--log-header] [-dvV] [THREADS]...
Options:
-h, --help show this screen
-f FILE input file path
-i NUM number of iteration [default: 1]
-a NUM alpha parameter [default: 15]
-b NUM beta parameter [default: 18]
-B NUM number of bins [default: 32]
-n NUM number of trials [default: 1]
-r NODE start from node r (default is random)
-s, --sources FILE sources file
--seed NUM random seed [default: 27491095]
--version ID algorithm version to run [default: 0]
--log FILE log times to a file
--log-header add a header to the log file
-d, --debug run in debug mode
-v, --verify verify results
-V, --verbose run in verbose mode
The applications takes options followed by the arguments of the options as inputs. A minimal example takes a graph as input is as follow:
$ bfs.exe -f karate.mtx
Supported graph file format
NWGraph recogonizes the following types of file format: * Matrix Market Exchange Formats
Running benchmarks
We have six main benchmarks: Breadth-first Search, Betweenness Centrality, Connected Component Decomposition, Page rank, Single Source Shortest Path, and Triangle Counting.
Breadth-first Search
The default sequential version of BFS is version 0 (default). The
fastest parallel version of BFS is version 11, the direction-optimizing
BFS. As an alternative to specifying one seed at a time, one or more
sources can be provided in a Matrix Market format file as an input of
BFS driver. Also, number of trials can be specified with -n
. In this
way, if no seed or seed file is provided, each trial will generate one
random number from 0 to |V|-1 as the random source for BFS as an input.
$ bench/bfs.exe -f karate.mtx --seed 0 --version 11 -n 3
Connected Component Decomposition
The default sequential version of CC is version 0 (default). The fastest parallel version of CC is version 7, Afforest.
$ bench/cc.exe -f karate.mtx --relabel --direction ascending
Page Rank
The fastest parallel version of PR is version 11 (default). The max
iterations can be set with -i
.
$ bench/pr.exe -f karate.mtx -i 1000
Single Source Shortest Path
The default sequential version of CC SSSP version 0 (default). The
fastest parallel version of SSSP is version 12, Delta-stepping. As an
alternative to specifying one seed at a time, one or more sources can be
provided in a Matrix Market format file as an input of SSSP driver.
Also, number of trials can be specified with -n
. In this way, if no
seed or seed file is provided, each trial will generate one random
number from 0 to |V|-1 as the random source for SSSP as an input.
$ bench/sssp.exe -f karate.mtx --seed 0 -n 3
Triangle Counting
The default sequential version of TC is version 0 (default). The fastest parallel version of TC is version 4.
$ bench/tc.exe -f karate.mtx --version 4 --relabel --upper
Betweenness Centrality
The default sequential version of BC is version 0 (default). The fastest parallel version of BC is version 5. As an alternative to specifying one seed at a time, one or more sources can be provided in a Matrix Market format file as an input of BC driver.
$ bench/bc.exe -f karate.mtx --version 5 --seed 0
Other useful things
Note that the following features may or may be available to every benchmark.
Relabel-by-degree
Relabel vertex by degree (also known as column/row permutation in matrix-matrix multiplication) may speed up the performance of the graph algorithm. It can improve the workload distribution and memory access pattern of the algorithm itself. To enable relabel-by-degree and relabel the degree of vertices in ascending order:
$ bench/cc.exe -f karate.mtx --relabel --direction ascending
Upper Triangular Order
In triangle counting, it allows to relabel the graph in upper/lower triangular order. This will greatly improve the performance of the algorithm. To enable relabel-by-degree and relabel the degree of vertices in upper triangular order:
$ bench/tc.exe -f karate.mtx --relabel --upper
Verifier
We implement a verifier in each benchmark to verify the correctness of the algorithms. To enable the verification of the algorithm:
$ bench/cc.exe -f karate.mtx -v
or
$ bench/cc.exe -f karate.mtx --verify
Multi-threading
Each algorithm/benchmark has both sequential version and parallel version. When a parallel algorithm is selected, multi-threading is enable by default. The number of threads is set to be the maximum available core on the machine. To enable multi-threading with different thread number, such as 128 threads:
$ bench/cc.exe -f karate.mtx 128
Benchmarking with GAP Datasets
To obtain the performance results reported in the PVLDB paper for NWGraph, “NWGraph: A Library of Generic Graph Algorithms and DataStructures in C++20”, please follow the following steps.
Download the GAP datasets from Suitesparse Matrix Collection in Matrix Market format
Run different graph benchmarks with the GAP datasets
Note that BFS and SSSP are run with 64 sources provided in a Matrix Market file, and BC are run with 4 sources. For PR, the max iterations has been set to 1000.
Benchmarking abstraction penalties
What is abstraction penalty?
There are two types of abstraction penalties here. Using a range-based interface introduces a variety of different ways to iterate through a graph (including the use of graph adaptors). While ranges and range based for loops are useful programming abstractions, it is important to consider any performance abstraction penalties associated with their use. We benchmark these penalties to ensure they will not significantly limit performance compared to a raw for loop implementation.
We also evaluated the abstraction penalty incurred for storing a graph
in different containers. In particular, we have selected
struct_of_array
, vector_of_vector
, vector_of_list
,
vector_of_forward_list
containers.
Running abstraction penalty experiments
For example let us consider the sparse matrix-dense vector multiplication (SpMV) kernel used in page rank, which multiplies the adjacency matrix representation of a graph by a dense vector x and stores the result in another vector y. To experimentally evaluate the abstraction penalty of different ways to iterate through a graph:
$ apb/spmv.exe -f karate.mtx
To experimentally evaluate the abstraction penalty of different containers for storing a graph:
$ apb/containers -f karate.mtx --format CSR --format VOV --format VOL --format VOF