July 26, 2024
Conference Paper

VISILIENCE: An Interactive Visualization Framework for Resilience Analysis using Control-Flow Graph

Abstract

Soft errors have become one of the major concerns for the error resilience of HPC applications, as those errors can cause HPC applications to generate serious outcomes such as Silent Data Corruptions (SDCs). A large body of approaches has been proposed to analyze the resilience of HPC applications. However, existing studies rarely address the challenges of the analysis result perception. Specifically, resilience analysis techniques often produce a massive volume of unstructured data, making it difficult for programmers to conduct the resilience analysis due to non-intuitive raw data. Furthermore, different analysis models produce diverse results with multiple levels of details, which may create hurdles to compare and explore the resilience of HPC program execution. To this end, we present VISILIENCE, an interactive VISual resILIENCE analysis framework to allow programmers to facilitate the resilience analysis of HPC applications. In particular, VISILIENCE leverages an effective visualization approach Control Flow Graph (CFG) to present a function execution. In addition, three widely-used models for resilience analysis (i.e., Y-Branch, IPAS, and TRIDENT) are seamlessly embedded into the framework for resilience analysis and result comparison. Multiple case studies have been conducted to demonstrate the effectiveness of our proposed framework VISILIENCE.

Published: July 26, 2024

Citation

Jiang H., S. Ruan, B. Fang, Y. Wang, and Q. Guan. 2023. VISILIENCE: An Interactive Visualization Framework for Resilience Analysis using Control-Flow Graph. In IEEE 28th Pacific Rim International Symposium on Dependable Computing (PRDC 2023), October 24-27, 2023, Singapore, 250-256. Piscataway, New Jersey:IEEE. PNNL-SA-169214. doi:10.1109/PRDC59308.2023.00041

Research topics