February 27, 2026
Report
AI-Ready Data Pilot Project Report
Abstract
The proliferation of artificial intelligence in scientific research has created an urgent need to define "AI-ready data" for researchers and, more importantly, provide resources to help them produce AI-ready data. At Pacific Northwest National Laboratory, we conducted a pilot study with three data scientists evaluating three CSV datasets from different scientific domains, followed by semi-structured interviews capturing assessment practices. Our findings reveal that AI-readiness evaluation is intuition-based, with practitioners asking "How fast can I go from raw data to my machine learning pipeline?" Data scientists consistently prioritized workflow efficiency, human interpretability, and quality stewardship signals. From these insights, we developed a practical evaluation framework comprising data requirements, metadata standards, and validation tests that provides actionable criteria for producing and curating AI-ready datasets, addressing the gap between theoretical understanding and practical implementation.Published: February 27, 2026