EXPERT Data
EXPERT datasets include five years of scientific articles in the nuclear domain collected from more than 10 public sources. The data collection consists of three types of data:
- Publication records with raw text extracted from abstracts;
- Abstracts enriched with linguistic annotations (part of speech tagging, semantic role labeling, co-reference resolution, topic and embedding vectors, etc.); and
- Structured domain knowledge representations (context and content knowledge graphs extracted from papers).
Requests to use the EXPERT dataset can be made via the Berkeley Data Cloud.
EXPERT Usable and Explainable Analytics
ESTEEM (Evaluating Spatiotemporal Embeddings) is the first descriptive analytics tool in a series of EXPERT usable and explainable analytics to enable real-time understanding and reasoning about global proliferation expertise evolution using publicly available information. Given a keyword of interest, the analyst can track text embedding similarities and contrast scientific knowledge, expertise, and capability evolution (extracted from scientific papers on SCOPUS, the Web of Science, and OSTI) over time and space in the context of real-world events across five countries of interest (India, Pakistan, Iran, North Korea, and Russia) and the United States.
XSearch (Expertise and Capability Search) is a novel, interactive analytics that fuses content and context from large amounts of scientific publications to enable analysts and decision makers to summarize and explore the expertise of scientific collaborations, partnerships, and technology development. It supports rapid and reproducible analyses of multiple resolutions—from individual scientists to teams, institutions, and countries—exploring the relationships and dynamics between scientists, papers, capabilities, institutions over time.
EXPERT Entity Resolution Tool is an interactive Jupyter widget to disambiguate entities (e.g., nodes, edges, and their corresponding attributes) in content and context knowledge graphs. For example, the tool enables users to leverage machine learning and automated text and graph similarity measures to efficiently merge similar scientists, institutions, venues, and papers within context graphs, keywords, topics, and tags within content graphs to ensure the quality of graph-based representations.
Find the EXPERT Entity Resolution Tool on GitHub
EXPERT Open-Source Software
The EXPERT toolkit contains next-generation artificial intelligence capabilities to detect, anticipate, and reason about proliferation expertise evolution from unstructured dynamic multilingual real-world data. To date, the EXPERT GitHub repository includes the following code and pre-trained models for proliferation expertise detection:
- Dynamic context graph extraction (knowledge about relationships between scientists, institutions, venues, and papers over time);
- Dynamic global and local content graph extraction;
- Natural language processing pipeline with part-of-speech tagging, semantic role labeling, syntactic parsing, co-reference resolution, entity and relation extraction, BERT embedding fine-tuning, and topic modeling; and
- Entity resolution widget to disambiguate entities in content and context knowledge graphs.