Abstract
Soft errors caused by transient bit flips have the potential to significantly impact an application's behavior. This has motivated the design of an array of techniques to detect, isolate, and correct soft errors using microarchitectural, archi- tectural, compilation-based, or application-level techniques to minimize their impact on the executing application. The first step toward the design of good error detection/correction techniques involves an understanding of an application's vulnerability to soft errors. To study the behavior of iterative methods in the presence of soft errors, we inject errors during the execution of these methods. In particular, we study the impact of one error (single- or multi-bit) on the execution of iterative methods. We use real life datasets from the SuiteSparse Matrix Collection (https://sparse.tamu.edu) and widely used iterative solver library (Iterative Methods Library, IML++ v1.2a). We instrument the iterative solver implementations so that our error injection methodology can control the iteration, vector, position, number of bits and position of the bits of the error injection. We employed 6 solvers and 28 datasets, performed a total of 1,744,800 error injection runs and collected more than 2.5TB data.
Exploratory License
Eligible for exploratory license
Market Sector
Data Sciences