August 13, 2018
Conference Paper

Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection

Abstract

Strategies to detect, correct, or mitigate the impact of soft errors rely on errors-injection experiments. For efficient evaluation, such experiments typically inject errors in software by sampling errors from a candidate distribution. Most often, these strategies randomly select and flip one bit in the output of an instruction. While single-bit flips might constitute a meaningful model for errors affecting hardware, the appropriateness of this model for software-based errors has not been studied. In this paper, we study the manifestation of errors in the output registers due to errors affecting candidate instructions executed by floating point ALUs. We inject single-bit flips into the RTL descriptions of floating point ALUs and analyze the differences between anticipated and observed outputs when executing floating-point addition, subtraction, multiplication, and division. We choose the operands for these instructions randomly and from operands observed in five benchmarks. We observe a rich distribution of errors in the output and analyze their implications for software-based fault-injection campaigns.

Revised: May 11, 2020 | Published: August 13, 2018

Citation

Subasi O., C. Chang, M. Erez, and S. Krishnamoorthy. 2018. Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection. In Proceedings of the 47th International Conference on Parallel Processing, (ICPP 2018), August 13-16, 2018, Eugene, OR, Article No. 59. New York, New York:ACM. PNNL-SA-134868. doi:10.1145/3225058.3225089