In this study, we explore three different strategies to mitigate trust issues following error produced by Large Language Models (LLMs). LLMs have the potential to support a wide range of tasks, but errors can erode trust and reduce user reliance on the system; restoring appropriate user trust following error is therefore critical. This study looked at the impact of confidence scores, explanations of the system’s capabilities, and user feedback and control on user trust following error. 68 participants viewed the results of an LLM to 20 general trivia questions, with an error introduced on the third trial. Each participant was presented with one of the mitigative strategies. For each trivia answer, participants rated their overall trust in the model and the reliability of the answer. Results showed an immediate drop in trust after the error; trust also recovered far slower than reliability. There were no differences across the three strategies in trust recovery; all conditions had an equivalent logarithmic trend in trust recovery following system error. Differences in overall trust were predicted by perceived reliability of the answer, suggesting that participants were evaluating results critically and using that to inform their trust in the model. Qualitative data supported this finding; participants expressed lasting distrust despite the LLM’s later accuracy. Results showcase the need to prioritize accuracy in LLM deployment, because early errors may irrevocably damage user trust calibration and later adoption. Mitigation strategies may prove most beneficial when models are reasonably skilled during the trust calibration process.
Published: April 23, 2025
Citation
Martell M.J., J.A. Baweja, and B.D. Dreslin. 2025.Mitigative Strategies for Recovering from Large Language Model Trust Violations.Journal of Cognitive Engineering and Decision Making 19, no. 1:76 - 95.PNNL-SA-193711.doi:10.1177/15553434241303577