April 12, 2023
Journal Article

Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods

Abstract

The validity of social science methods is in question because ground truth information is unavailable and likely impossible to obtain. The Defense Advanced Research Projects Agency's (DARPA's) Ground Truth (GT) program was designed to evaluate social modeling techniques through simulations with ground truth intentionally and systematically embedded to understand complex Human Domain systems and their dynamics. Our multidisciplinary team of data scientists and computational social scientists with expertise in machine learning, visual analytics, and human-machine teaming focused on investigating accuracy, reproducibility, generalizability, and robustness of causal modeling approaches on fully observed and sampled simulated data of human social behavior. In addition, we evaluated the feasibility of using machine learning models to predict future social behavior with and without causal knowledge explicitly embedded. We first present our causal modeling approach to discover the causal structure of the simulated data produced by TA1 performers. Our approach adapts the state-of-the-art causal discovery (including ensemble models), machine learning, data analytics, and visualization techniques to allow a human-machine team to reverse-engineer unbiased and consistent estimation of causal relationships from sampled and full simulation data. We next present our reproducibility analysis of TA2 performance using a range of causal discovery models applied to both sampled and full data, and analyze their effectiveness and limitations. We further investigate the generalizability and robustness to sampling of the state-of-the-art causal discovery approaches, including causal ensembles on a range of internally simulated datasets with known ground truth to further demonstrate generalizability of our findings. Finally, we present and evaluate our agent-based approach to answer predict questions designed to anticipate social behavior using sampled research request data and a range of internally simulated datasets with known ground truth. Our experimental results not only reveal and empirically demonstrate the limitations of existing causal modeling approaches applied to large-scale, noisy, high-dimensional data with latent unobserved variables, mixed data types, and unknown relationships between them, but also outline lessons learned and recommendations to improve causal discovery and prediction of social behavior from observational data.

Published: April 12, 2023

Citation

Volkova S., D.L. Arendt, E.G. Saldanha, M.F. Glenski, E.M. Ayton, J.A. Cottam, and S.G. Aksoy, et al. 2023. Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods. Computational & Mathematical Organization Theory 29. PNNL-SA-156946. doi:10.1007/s10588-021-09351-y