January 20, 2026
Report

A case study in contrastive learning information combination: Application to technical forensics of additive manufacturing filament source identification

Abstract

Combination of information from disparate data sources into a single decision is a core challenge in many fields, including the field of technical forensics. Technical forensics (TF) utilizes technical characterization of questioned samples to determine properties of that sample; these properties are then used to infer information of forensic interest, such as provenance, age, or attribution. TF is utilized in traditional forensic applications, such as the attribution of material fragments from an explosive, and in nuclear forensic applications, such as the attribution of actinides which have been interdicted out of regulatory control. The challenge of combining information from disparate sources, described alternately by many terms including “Data Fusion” and “Data Integration”, is exacerbated in the technical forensics domain due to at least two factors: the challenge of interpreting each information source singularly, and the relatively small data set sizes available. Extensive literature exists attempting to combine technical forensics information sources, both in manual and automated processes. These attempts are often bespoke to the specific information sources (such as the bi-, tri-, or quad-isotope chart (Moody, Grant, and Hutcheon 2005)), with some emerging examples of simple early- and late- fusion (, respectively). Simultaneous to the information combination efforts described in the previous paragraph, the field of natural language processing attempted (and largely succeeded) in combining information from multiple non-technical information sources. The ecosystem of “multi-modal” language models, which can take text and images as input, and generate text and images as output, became large and diverse by 2025 (Khan et al. 2025). In a generalized sense, many of these methods are trained by learning neural networks which can convert raw text or images into a vector of numbers describing the text or image, hereafter called “embeddings” and the neural networks performing the conversion are called “embedders”. By using a separate embedder for text and images, finding coincident text and images (such as images with their captions), and optimizing the parameters of the embedders such that the embeddings for the text and the image are similar, the field has found a bridge between text and images (Girdhar et al. 2023). It is the contention of the authors of this report that this insight is not limited to text and images but instead can be extended to any modality which can be found coincidently. The subject of the rest of this report is the application of this method to example multi-modal technical forensic data. Some details about the data used in this report are not appropriate for this report, and are included in a companion report (PNNL-38669).

Published: January 20, 2026

Citation

Hagen A.R., C.A. Nizinski, J.L. Yaros, C.C. Hayden, and N.H. Ly. 2025. A case study in contrastive learning information combination: Application to technical forensics of additive manufacturing filament source identification Richland, WA: Pacific Northwest National Laboratory.

Research topics