Skip to main content

PNNL

  • About
  • News & Media
  • Careers
  • Events
  • Research
    • Scientific Discovery
      • Biology
        • Chemical Biology
        • Computational Biology
        • Ecosystem Science
        • Human Health
          • Cancer Biology
          • Exposure Science & Pathogen Biology
        • Integrative Omics
          • Advanced Metabolomics
          • Chemical Biology
          • Mass Spectrometry-Based Measurement Technologies
          • Spatial and Single-Cell Proteomics
          • Structural Biology
        • Microbiome Science
          • Biofuels & Bioproducts
          • Human Microbiome
          • Soil Microbiome
          • Synthetic Biology
        • Predictive Phenomics
      • Chemistry
        • Computational Chemistry
        • Chemical Separations
        • Chemical Physics
        • Catalysis
      • Earth & Coastal Sciences
        • Global Change
        • Atmospheric Science
          • Atmospheric Aerosols
          • Human-Earth System Interactions
          • Modeling Earth Systems
        • Coastal Science
        • Ecosystem Science
        • Subsurface Science
        • Terrestrial Aquatics
      • Materials Sciences
        • Materials in Extreme Environments
        • Precision Materials by Design
        • Science of Interfaces
        • Solid Phase Processing
          • Cold Spray
          • Friction Stir Welding & Processing
          • ShAPE
      • Nuclear & Particle Physics
        • Dark Matter
        • Fusion Energy Science
        • Neutrino Physics
      • Quantum Information Sciences
    • Energy Resiliency
      • Electric Grid Modernization
        • Emergency Response
        • Grid Analytics
          • AGM Program
          • Tools and Capabilities
        • Grid Architecture
        • Grid Cybersecurity
        • Grid Energy Storage
        • Transmission
        • Distribution
      • Energy Efficiency
        • Appliance and Equipment Standards
        • Building Energy Codes
        • Building Technologies
          • Advanced Building Controls
          • Advanced Lighting
          • Building-Grid Integration
        • Building and Grid Modeling
        • Commercial Buildings
        • Federal Buildings
          • Federal Performance Optimization
          • Resilience and Security
        • Residential Buildings
          • Building America Solution Center
          • Energy Efficient Technology Integration
          • Home Energy Score
        • Energy Efficient Technology Integration
      • Energy Storage
        • Electrochemical Energy Storage
        • Flexible Loads and Generation
        • Grid Integration, Controls, and Architecture
        • Regulation, Policy, and Valuation
        • Science Supporting Energy Storage
        • Chemical Energy Storage
      • Fossil Energy
        • Subsurface Energy Systems
        • Carbon Management
          • Carbon Capture
          • Carbon Storage
          • Carbon Utilization
        • Advanced Hydrocarbon Conversion
      • Nuclear Energy
        • Fuel Cycle Research
        • Advanced Reactors
        • Reactor Operations
        • Reactor Licensing
      • Renewable Energy
        • Solar Energy
        • Wind Energy
          • Wind Resource Characterization
          • Wildlife and Wind
          • Community Values and Ocean Co-Use
          • Wind Systems Integration
          • Wind Data Management
          • Distributed Wind
        • Marine Energy
          • Environmental Monitoring for Marine Energy
          • Marine Biofouling and Corrosion
          • Marine Energy Resource Characterization
          • Testing for Marine Energy
          • The Blue Economy
        • Hydropower
          • Environmental Performance of Hydropower
          • Hydropower Cybersecurity and Digitalization
          • Hydropower and the Electric Grid
          • Materials Science for Hydropower
          • Pumped Storage Hydropower
          • Water + Hydropower Planning
        • Grid Integration of Renewable Energy
        • Geothermal Energy
      • Transportation
        • Bioenergy Technologies
          • Algal Biofuels
          • Aviation Biofuels
          • Waste-to-Energy and Products
        • Hydrogen & Fuel Cells
        • Vehicle Technologies
          • Emission Control
          • Energy-Efficient Mobility Systems
          • Lightweight Materials
          • Vehicle Electrification
          • Vehicle Grid Integration
      • Environmental Management
        • Waste Processing
        • Radiation Measurement
        • Environmental Remediation
    • National Security
      • Chemical & Biothreat Signatures
        • Contraband Detection
        • Pathogen Science & Detection
        • Explosives Detection
        • Threat-Agnostic Biodefense
      • Cybersecurity
        • Discovery and Insight
        • Proactive Defense
        • Trusted Systems
      • Nuclear Material Science
      • Nuclear Nonproliferation
        • Radiological & Nuclear Detection
        • Nuclear Forensics
        • Ultra-Sensitive Nuclear Measurements
        • Nuclear Explosion Monitoring
        • Global Nuclear & Radiological Security
      • Stakeholder Engagement
        • Disaster Recovery
        • Global Collaborations
        • Legislative and Regulatory Analysis
        • Technical Training
      • Systems Integration & Deployment
        • Additive Manufacturing
        • Deployed Technologies
        • Rapid Prototyping
        • Systems Engineering
      • Threat Analysis
        • Advanced Wireless Security
          • 5G Security
          • RF Signal Detection & Exploitation
        • Internet of Things
        • Maritime Security
        • Millimeter Wave
        • Mission Risk and Resilience
    • Data Science & Computing
      • Artificial Intelligence
      • Graph and Data Analytics
      • Software Engineering
      • Computational Mathematics & Statistics
      • Future Computing Technologies
        • Adaptive Autonomous Systems
    • Publications & Reports
    • Featured Research
  • People
    • Inventors
    • Lab Leadership
    • Lab Fellows
    • Staff Accomplishments
  • Partner with PNNL
    • Education
      • Undergraduate Students
      • Graduate Students
      • Post-graduate Students
      • University Faculty
      • University Partnerships
      • K-12 Educators and Students
      • STEM Education
        • STEM Workforce Development
        • STEM Outreach
        • Meet the Team
      • Internships
    • Community
      • Regional Impact
      • Philanthropy
      • Volunteering
    • Industry
      • Available Technologies
      • Industry
      • Industry Partnerships
      • Licensing & Technology Transfer
      • Entrepreneurial Leave
      • Visual Intellectual Property Search (VIPS)
  • Facilities & Centers
    • All Facilities
      • Atmospheric Radiation Measurement User Facility
      • Electricity Infrastructure Operations Center
      • Energy Sciences Center
      • Environmental Molecular Sciences Laboratory
      • Grid Storage Launchpad
      • Institute for Integrated Catalysis
      • Interdiction Technology and Integration Laboratory
      • PNNL Portland Research Center
      • PNNL Seattle Research Center
      • PNNL-Sequim (Marine and Coastal Research)
      • Radiochemical Processing Laboratory
      • Shallow Underground Laboratory

Mathematics for Artificial Reasoning in Science

  • Software
  • Multimedia
  • Team
  • Seminars
    • PMML Reading Group
  • Projects
  • Publications

Breadcrumb

  1. Home
  2. Projects
  3. Mathematics for Artificial Reasoning in Science

Mega AI

PI: Maria Glenski

Objective

Scaling artificial intelligence (AI) to massive-scale, multi-purpose foundation models capable of multi-purpose inference to enable reasoning and generative tasks for science and security mission domains.

  • Climate fiscal year (FY) 2022 and chemistry (FY22, FY23)
    • pretraining large-scale AI models from scratch (1B+ params)
    • scientific knowledge: text (0.87TB), molecular databases (110K, properties + structures)
  • Cybersecurity and code (FY24)
    • targeted fine-tuning and adaption to boost performance on multi-purpose tasking from zero-shot to instruction tuned
    • enable on-prem, mission-informed vulnerability assessment and code characterization, search, and assessment
An overview of several task types supported by our Chemistry model, with an example for chemical entity extraction where domain-pretrained models can outperform larger, open-source SOTA models.
An overview of several task types supported by our Chemistry model, with an example for chemical entity extraction where domain-pretrained models can outperform larger, open-source models.

Overview 

State-of-the-art, large-scale language models or multimodal foundation models incorporating a text modality are trained on large collections of pretraining data, largely focusing on general-purpose language and/or vision data sources. However, performance often degrades when applying foundation models trained on general-purpose datasets to science and security domains, such as handling the vocabulary shift between general language versus domain knowledge in areas like molecular chemistry and climate. By leveraging a large collection of scientific literature, Mega AI focuses on developing next-generation foundation models addressing research gaps in large-scale multimodal representation learning, multitask inferences, and generalizability, rapid adaptivity, and usability of these artificial intelligence technologies.

Foundation models, also known as neural platforms, are large-scale artificial intelligence models that leverage self-supervised pretraining at scale—for example, large-scale pretraining with zero supervision on unlabeled data—and can support adaptation to a wide range of downstream tasks via transfer learning and fine-tuning toward specific tasks.

Analyses performed on the Mega AI project have measured the benefit of in-domain pretraining when leveraging foundation models, including highlighting the performance gains over both general-language-focused baseline models and general-language baselines, which received additional pretraining on the Mega AI pretraining collection.

Scaling AI for science and security

Impact

Other conventional open-source or commercial approaches may require:

  • application programming interface (API) usage (cloud-based querying)
  • restrictions on adaption, tuning, or appropriate use
  • lack of transparency on (proprietary) data or training choices that could impact usability or quality of models if used for mission

Mega AI explores tradeoffs of development choices (pretraining from scratch, fine-tuning off-the-shelf base models, and targeted fine-tuning and/or task-prompts) and model performance to support on premise model use, mission informed training/tuning of usable large language models (LLMs), and traceable model development and evaluation.

Publications and Presentations

  • Horawalavithana Y.S., E.M. Ayton, S. Sharma, S.A. Howland, M. Subramanian, S.W. Vasquez, and R.J. Cosbey, et al. 2022. "Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned." In Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models, May 2022, Vitrual and Dublin, Ireland, 160–172. Stroudsburg, Pennsylvania: Association for Computational Linguistics. doi:10.18653/v1/2022.bigscience-1.12
  • COLING 2022 Workshop on Multimodal/Multipurpose AI Evaluation, Artificial Intelligence for Earth System Predictability (AI4ESP). Volkova S., N.O. Hodas, and T.D. Scheibe. 2021. "AI-Driven Cross-Domain Knowledge Discovery and Hypotheses Generation for Enhanced Earth System Predictability."

PNNL

  • Get in Touch
    • Contact
    • Careers
    • Doing Business
    • Environmental Reports
    • Security & Privacy
    • Vulnerability Disclosure Policy
  • Research
    • Scientific Discovery
    • Energy Resiliency
    • National Security
Subscribe to PNNL News
Department of Energy Logo Battelle Logo
Pacific Northwest National Laboratory (PNNL) is managed and operated by Battelle for the Department of Energy
  • YouTube
  • Facebook
  • X (formerly Twitter)
  • Instagram
  • LinkedIn