November 21, 2024
Report

The VIBES Are Shifting: Assessing Emergent Capabilities in Multi-Modal Models

Abstract

Researchers assessing open-source domains such as the internet, and particularly those studying information conflict, often have no single prescribed workflow. In the course of their research, they may need to perform a diverse array of tasks far beyond simply identifying an ever-changing set of objects. These analytical tasks can include ascertaining the provenance of images, understand an image in the context of accompanying text-based data, or identifying indicators of digital image manipulation. They must further be able to do this at the scale of tens of thousands of images or more. Traditional machine vision models have typically lacked the flexibility and breadth of performance sufficient for these needs. The research team from Pacific Northwest National Laboratory assessed the performance of a single baseline CLIP ViT-L model against a series of analytical tasks relevant for the study of online information conflict.

Published: November 21, 2024

Citation

Butner R.S., A.J. Kingsland, S.A. Smith, K. Kingsland, B.T. Kennedy, and J.C. Chong. 2024. The VIBES Are Shifting: Assessing Emergent Capabilities in Multi-Modal Models Richland, WA: Pacific Northwest National Laboratory.

Research topics