Graph algorithms with polynomial space and time requirements often become infeasible for massive graphs with billions of edges or more. State-of-the-art approaches therefore employ approximate serial, parallel, and distributed algorithms to tackle these challenges. However, such approaches require storing the entire graph in memory and thus need access to costly computing resources such as clusters and supercomputers. In this paper, we present practical streaming approaches for solving massive graph problems using limited memory for two prototypical graph problems: maximum weighted matching and minimum weighted edge cover. For matching, we conduct a thorough computational study on two of the semi-streaming algorithms including a recent breakthrough result that achieves a $1/(2+\varepsilon)$-approximation of the weight while using $O( n \log W /\epsilon)$ memory (here $n$ is the number of vertices and $W$ is the maximum edge weight), designed by Paz and Schwartzman [SODA, 2017]. Empirically, we show that the semi-streaming algorithms produce matchings whose weight is close to the best $1/2$-approximate offline algorithm while requiring less time and an order-of-magnitude less memory.
For minimum weighted edge cover, we develop three novel semi-streaming algorithms. Two of these algorithms require a single pass through the input graph, require $O(n \log n)$ memory, and provide a 2-approximation guarantee on the objective. We also leverage a relationship between approximate maximum weighted matching and approximate minimum weighted edge cover to develop a two-pass $3/2+\epsilon$-approximate algorithm with the memory requirement of Paz and Schwartzman's semi-streaming matching algorithm. These streaming approaches are compared against the state-of-the-art 3/2-approximate offline algorithm.
The semi-streaming matching and the novel edge cover algorithms proposed in this paper can process graphs with several billions of edges in under 30 minutes using 6 GB of memory, which is at least an order of magnitude improvement from the offline (non-streaming) algorithms. For the largest graph, the best alternative offline parallel approximation algorithm (GPA+ROMA) could not finish in three hours even while employing hundreds of processors and 1 TB of memory. We also demonstrate an application of the semi-streaming algorithm by computing a matching using linearly bounded memory on item intersection graphs derived from three machine learning datasets, whereas the existing offline algorithms could not complete on one of these datasets since their memory requirements exceeded 1TB.
Published: August 6, 2024
Citation
Ferdous S.M., A. Pothen, and M. Halappanavar. 2024.Streaming Matching and Edge Cover in Practice. In Proceedings of the 22nd International Symposium on Experimental Algorithms (SEA 2024), July 24-26, Vienna, Austria. Leibniz International Proceedings in Informatics (LIPIcs), 301, 12:1-12:22.PNNL-SA-197168.doi:10.4230/LIPIcs.SEA.2024.12