Imagine you’re scrolling through social media. A video catches your eye—maybe it’s a reel of a new coffeemaker, or video clip of a hair product. Suddenly, you start seeing the product everywhere on social media—everyone you know has seen it too. This is no coincidence. This is influence maximization.
Viral marketing isn’t the only “viral” application of the influence maximization problem. It can also be used to identify superspreaders of disease in epidemiological studies. In either of these cases, identifying the most influential players can be a computational challenge.
The optimization framework Ripples can help by transforming computational runtimes from hours to less than one minute. This influence maximization software package was developed at Pacific Northwest National Laboratory (PNNL) with contributions from Washington State University (WSU) and North Carolina State University (NCSU). PNNL data scientist Marco Minutoli will present the latest findings using Ripples during the upcoming International Conference for High Performance Computing, Networking, Storage, and Analysis on November 15.
Efficient graph exploration
Social media and disease-spread networks can be thought of as graphs. Each person represents a “node” in the graph, and their connections are an “edge.” Using graph analytics, researchers can explore these networks to find the most influential nodes.
This, however, is a computational challenge. Each edge must be assigned a value corresponding to the strength of the connection between two nodes. However, it’s not always clear what value to use in each situation. Running simulations helps clarify these values.
“With each repeated simulation, our confidence in the computed solution increases,” said Mahantesh Halappanavar, chief data scientist and leader of the Data Sciences & Machine Intelligence group at PNNL.
One way of doing this is through a “breadth-first probabilistic traversal” of the graph. This thorough, but laborious, process involves visiting each and every node and edge of the graph in a well-defined order, using a lot of computation time. Researchers developed a “fused” method that reduces the number of edge visits with approximations and improves performance with parallel implementations, and incorporated it into Ripples.
“We sped up graph traversals by 10x through our method,” said Minutoli.
Scaling to the exascale
Exascale systems—like Frontier, the world’s fastest supercomputer located at Oak Ridge National Laboratory—are equipped to handle very large computational problems. Minutoli and his team tested their latest version of Ripples using varying numbers of nodes, GPUs, and CPU cores to see if it could still work as intended on an exascale system.
“We saw strong scaling results with Ripples on both single-node multi-GPU and multi-node heterogeneous systems,” said Minutoli.
Ripples was initially developed by Minutoli with contributions from Halappanavar and PNNL computer scientist Antonino Tumeo, as well as collaborators Ananth Kalyanaraman (WSU) and Maurizio Drocco (IBM TJ Watson Research Center). Reece Neff, graduate student at NCSU, worked on this effort as a summer research intern at PNNL, where he is now a Distinguished Graduate Research Fellow funded through the Department of Defense. Mostafa Eghbali Zarch and Michela Becchi of NCSU have also contributed to Ripples since its initial development. This research was supported by the Department of Energy’s Exascale Computing Program, ExaGraph project.