Models and Benchmarks Thrust

The Models and Benchmark team at PermitAI is dedicated to the advancement and optimization of AI models within the environmental review and permitting domain, particularly focusing on NEPA-specific tasks. This team has analyzed several off-the-shelf large language models, determining that standard models often fall short in addressing the unique challenges presented by domains like NEPA. In response, they have leveraged the robust NEPATEC database to create and train efficient bespoke models that excel in summarizing, retrieving information, and answering questions pertinent to NEPA. In addition to models, the team has been focusing on developing a suite of benchmarks to evaluate the performance of models on NEPA-specific tasks.

Custom Model Development

The team's approach prioritizes smaller language models, ranging from 1 to 7 billion parameters, which strike a balance between performance efficiency and resource consumption, thus keeping inference costs and energy usage low. These models are tailored for tasks including comment processing evaluation, GIS analytics, and NEPA document drafting.

Benchmarking and Evaluation

A core initiative of this thrust area is the creation of NEPA-Bench, a comprehensive set of benchmarks that assess AI model performance on real-world NEPA tasks. This suite features:

NEPAQuAD: Examines factual knowledge of models in the field of NEPA.
Comment-Bench: Summarizes and categorizes public comments.
EIS-Bench: Extracts relevant metadata from lengthy Environmental Impact Statements (EIS).
Tribe-Bench: Identifies tribal names and concerns in project documents.
FedReg-Bench: Extracts structured data from Federal Register notices.
Geo-Bench: Offers NEPA map classification, location extraction, and other GIS metadata functionalities.

NEPA-Bench currently encompasses over 10,000 evaluation entries, providing a rich dataset for model assessment.

Innovative Tools

In addition to developing benchmarks, the team has established automated and human evaluation procedures to rigorously examine the effectiveness and safety of models and applications prior to public release. They have also introduced MAPLE, a cloud API-friendly assessment pipeline designed for seamless evaluation of large language models against benchmarks like NEPA-Bench.

PermitAI is committed to enabling rapid AI model prototyping, allowing researchers to experiment with diverse AI model architectures, algorithms, and preprocessing techniques, thus fostering innovation and efficiency in environmental review and permitting processes.

PermitAI Models and Benchmarks Team

Anurag Acharya (Model and Benchmark, Technical Lead)
Sadie Montgomery (Model and Benchmark, Domain Lead)
Rounak Meyur
Koby Hayashi
Anusha Devulapally
Henry Warmerdam