December 28, 2016
Conference Paper

PRESAGE: Protecting Structured Address Generation against Soft Errors

Abstract

Modern computer scaling trends in pursuit of larger component counts and power efficiency have, unfortunately, lead to less reliable hardware and consequently soft errors escaping into application data ("silent data corruptions"). Techniques to enhance system resilience hinge on the availability of efficient error detectors that have high detection rates, low false positive rates, and lower computational overhead. Unfortunately, efficient detectors to detect faults during address generation have not been widely researched (especially in the context of indexing large arrays). We present a novel lightweight compiler-driven technique called PRESAGE for detecting bit-flips affecting structured address computations. A key insight underlying PRESAGE is that any address computation scheme that propagates an already incurred error is better than a scheme that corrupts one particular array access but otherwise (falsely) appears to compute perfectly. Ensuring the propagation of errors allows one to place detectors at loop exit points and helps turn silent corruptions into easily detectable error situations. Our experiments using the PolyBench benchmark suite indicate that PRESAGE-based error detectors have a high error-detection rate while incurring low overheads.

Revised: March 22, 2017 | Published: December 28, 2016

Citation

Sharma V.C., G. Gopalakrishnan, and S. Krishnamoorthy. 2016. PRESAGE: Protecting Structured Address Generation against Soft Errors. In IEEE 23rd International Conference on High Performance Computing (HiPC 2016), December 19-22, 2016, Hyderabad, India. Los Alamitos, California:IEEE Computer Society. PNNL-SA-124186. doi:10.1109/HiPC.2016.037