Probing for Artifacts: Detecting Imagenet Model Evasions

July 28, 2020

Conference Paper

Probing for Artifacts: Detecting Imagenet Model Evasions

Abstract

While deep learning models have made incredible progress across a variety of machine learning tasks, they remain vulnerable to adversarial examples crafted to fool otherwise trustworthy models. In this work we approach this problem through the lens of a detection framework. We propose a classification network that uses the hidden layer activations of a trained model as inputs to detect adversarial artifacts in an input. We train this classification network simultaneously against multiple adversarial algorithms to create a more robust detector and show higher detection rates than several alternatives. The novelty of our approach is in the scale and scope of probing Imagenet models for adversarial artifacts. In addition, we propose an improvement to feature squeezing, another common adversarial example detection method.

Revised: October 14, 2020 | Published: July 28, 2020

Citation

Rounds J., A.J. Kingsland, M.J. Henry, and K.R. Duskin. 2020. Probing for Artifacts: Detecting Imagenet Model Evasions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2020), June 14-19, 2020, Seattle, WA, 3432-3441. Piscataway, New Jersey:IEEE. PNNL-SA-152048. doi:10.1109/CVPRW50498.2020.00403