August 26, 2025
Report
NEPATEC v2.0: Standardized Metadata and Text Corpus of National Environmental Policy Act Documents
Abstract
The National Environmental Policy Act of 1969, as amended (NEPA), is a major environmental law in the United States, requiring Federal agencies to consider and document potential environmental impacts before deciding on a proposed action. Modernization of NEPA and permitting processes faces significant challenges due to the lack of standardized formats and interoperable systems for organizing and sharing NEPA-related information across agencies. Much of the information gathered during NEPA reviews is written into documents such as categorical exclusions, environmental assessments, and environmental impact statements, then filed in predominately independent agency file stores that may or may not be publicly accessible. The application of metadata and data standards, such as those recommended by the Council on Environmental Quality (CEQ), to NEPA documents offers a shared vocabulary and structure for key entities like projects, processes, and documents that can streamline information exchange and enhance collaboration across systems. In this work, we publicly release NEPATEC2.0, an expanded corpus of NEPA documents with associated metadata. NEPATEC2.0 encompasses approximately 120,000 documents from 60,000 projects prepared by more than 60 different agencies. Modeled to align with CEQ metadata standards, NEPATEC2.0 promotes consistency in environmental reviews and supports the ongoing effort to modernize permitting technologies by facilitating more transparent, efficient, and data-driven decision-making. Importantly, NEPATEC2.0 demonstrates the possibilities and limitations of large language model-based prompting to extract information from NEPA documents at scale.Published: August 26, 2025