Data Lakehouse
PermitAI is developing an enriched, large-scale NEPA database that provides seamless access to thousands of historical NEPA review documents.
Released to the public in June 2024, NEPA Text Corpus (NEPATEC) 1.0 is the first iteration of this comprehensive database, consisting of more than 28,000 documents from nearly 3,000 projects across more than 100 agencies. NEPATEC2.0, an expanded corpus of public NEPA documents consists of more than 120,000 documents from 60,000 projects prepared by more than 60 different agencies. Modeled to align with Council of Environmental Quality (CEQ)’s metadata standards, NEPATEC2.0 promotes consistency in environmental reviews and supports the ongoing effort to modernize permitting technologies by facilitating more transparent, efficient, and data-driven decision-making. The database can also extend beyond NEPA documents to include other permitting review documents.
PermitAI has enhanced the database's capabilities by integrating offline data from various agencies and ensuring continuous updates through a newly developed API that connects directly to the Environmental Protection Agency.
Beyond data aggregation, PermitAI is pioneering standardization and augmentation techniques with large language models that will greatly improve data quality and accessibility. This includes building NEPA data connections, which includes data on projects, processes, documents, and public involvement, as well as comments and geographic information system records.
PermitAI Data Lakehouse Team
- Tim Vega (Data, Technical Lead)
- Dan Nally (Data, Domain Lead)
- Sai Koneru
- Kathy Nwe
- Heng Wan
- Alex Buchko
- Taylor Edwards
- Cleve Davis
- Kaustav Bhattacharjee
- Siddhartha Das
- Ashik Islam
- Lauren Phillips