September 23, 2021
Conference Paper

The Effect of Antagonistic Behavior in Reinforcement Learning

Abstract

The significant achievements of deep reinforcement learning (RL) have motivated researchers to also investigate its shortcomings. Such work has shown that typical methods in deep RL tend to produce brittle policies that overfit to the training environment. In this paper, we introduce the notion of purely antagonistic behavior in value-based agents, where the objective is not to maximize reward but to minimize the victim’s value over time. This notion is motivated by the scenario in which an antagonistic human architect, without access to the environment’s reward function, wants to build an RL agent that can impede another well-trained RL victim agent. First, we formalize a notion of antagonistic behavior in RL. Then, we provide experiments that show how a purely antagonistic agent performs compared to a well-trained victim that learns directly from the game’s rewards. Our results suggest that if one’s goal is to find vulnerabilities in well-trained agents, direct access to the environment’s rewards is not necessary, and antagonistic behavior can be measured independently from environment wins and losses.

Published: September 23, 2021

Citation

Fujimoto T.C., T.J. Doster, A. Attarian, J.M. Brandenberger, and N.O. Hodas. 2021. The Effect of Antagonistic Behavior in Reinforcement Learning. In AAAI-21 Workshop on Reinforcement Learning in Games, February 8, 2021, Virtual. Menlo Park, California:Association for the Advancement of Artificial Intelligence. PNNL-SA-157666.