February 27, 2026
Report

AI-Ready Data Pilot Project Report

Abstract

The proliferation of artificial intelligence in scientific research has created an urgent need to define "AI-ready data" for researchers and, more importantly, provide resources to help them produce AI-ready data. At Pacific Northwest National Laboratory, we conducted a pilot study with three data scientists evaluating three CSV datasets from different scientific domains, followed by semi-structured interviews capturing assessment practices. Our findings reveal that AI-readiness evaluation is intuition-based, with practitioners asking "How fast can I go from raw data to my machine learning pipeline?" Data scientists consistently prioritized workflow efficiency, human interpretability, and quality stewardship signals. From these insights, we developed a practical evaluation framework comprising data requirements, metadata standards, and validation tests that provides actionable criteria for producing and curating AI-ready datasets, addressing the gap between theoretical understanding and practical implementation.

Published: February 27, 2026

Citation

Sheridan S.A., J.M. Schneider, and M. Klein. 2026. AI-Ready Data Pilot Project Report Richland, WA: Pacific Northwest National Laboratory.

Research topics