Software Products
ddKS
Comparing samples of high dimensional data is a challenging task with relevance to many fields including climate modeling, high energy physics, and machine learning. The ddKS package computes the differences between two high dimensional distributions quickly using our accelerated, “d-dimensional Kolmogorov-Smirnov distance”.
POC: Alex Hagen
CODE | PAPER
MCL
Minos Computing Library, or ‘MCL’, is a modern task-based, asynchronous programming model and runtime for executing complex scientific workflows on extremely heterogeneous systems.
POC: Roberto Gioiosa
CODE | Paper
TAZeR
Transparent Asynchronous Zero-Copy Remote I/O, or ‘TAZeR’, is a remote I/O framework that reduces effective data access latency in large scale scientific workflows.
POC: Ryan Friese
CODE | Paper
Lamellar
Lamellar is an asynchronous tasking and PGAS runtime for HPC systems developed in RUST.
POC: Ryan Friese
CODE
SODA-Opt
SODA-Opt is a tool that enables identifying segments of applications written in high-level productive programming frameworks (Python, Machine Learning) for hardware acceleration through high-level synthesis tools. SODA-OPT is developed within the MLIR compiler infrastructure.
POC: Antonino Tumeo
CODE | Paper
OpenCGRA
OpenCGRA is an open-source unified framework for modeling, testing, and evaluating specialized Coarse-Grained Reconfigurable Arrays (CGRAs).
POC: Antonino Tumeo
CODE | Paper
Svelto
Svelto is a high-level synthesis methodology for the automated generation of high-throughput accelerators for graph analytics and irregular computation.
POC: Antonino Tumeo
CODE | Paper
HLS of Task Parallel Specification
HLS of Task Parallel Specification is a High-Level Synthesis methodology that generates specialized accelerators starting from high-level parallel programs. The methodology combines together statically scheduled accelerators in a dataflow architecture.
POC: Antonino Tumeo
CODE | Paper
BigFlowSim
BigFlowSim is a workflow I/O simulator-emulator and trace generator that captures several parameters that affect local and remote I/O performance. BigFlowSim generates a large variety of flows within and between tasks of distributed workflows. With BigFlowSim, we have systematically studied TAZeR's performance on different data flows.
POC: Ryan Friese
CODE | Paper
PALM
Palm is a suite of performance modeling tools (Palm, Palm-Task, Representative-Paths, Palm/FastFootprints, MIAMI-NW) to assist performance analysis and predictive model generation. Palm generates models by combining top-down (human-provided) semantic insight with bottom-up static and dynamic analysis. Palm has been used to model irregular applications with sparse data structures and unpredictable access patterns. Recent additions focus on rapid characterization of memory behavior.
POC: Nathan Tallent
CODE | Paper
STM
STM is a temporal motif-based tool for graph characterization and embedding generation.
POC: Sumit Purohit
CODE | Paper
QLiG
QLiG is a Graph-based Query specification to construct structural, approximate queries for Property Graph using high-level concepts such as path, structure, and constraints.
POC: Sumit Purohit
CODE | Paper
GridPack
GridPack is a power grid simulation framework that is designed to simplify the development of power grid simulations that can run on a range of machines, from desktop workstation to high performance computing platforms. GridPACK is written in C++ and consists of a collection of libraries and software modules that can be used to build power grid applications that can run on advanced architectures. In addition, it has many complete application modules that can be used either standalone, to run standard simulation or can be combined in new ways to create more complicated simulation workflows.
POC: Bruce Palmer
CODE | PAPER
TranSEC
TranSEC is a scalable approach to estimating vehicle travel times at the street level using aggregated Uber data and graph representations of the underlying road network. Approach demonstrated for LA and Seattle metro area networks.
POC: Arun Sathanur, Arif Khan
TransBEAM
TransBEAM is a set of classes and methods to analyze and compare different state-of-the-art data-driven building energy modeling techniques. The module also implements transfer learning to make use of both the simulation data and sparse field data.
POC: Milan Jain, Arun Sathanur
CODE
Qual2M
Quantitative Learned Latency Model, or ‘Qual2M’, is the implementation of a Machine Learning methodology for quantitative performance of optimized latency-sensitive code on CPUs. To capture the cost distribution and the most severe bottlenecks, Qual2M combines classification and regression using ensemble decision trees, which also provide some interpretability.
POC: Arun Sathanur, Nathan Tallent
CODE
NWQ-Sim
NWQ-Sim is a quantum system simulation environment running on classical heterogeneous supercomputers. It currently comprises a state-vector simulator (SV-Sim) and a density-matrix simulator. It supports Intel/AMD/IBM CPUs, NVIDIA/AMD GPUs, Xeon Phi, etc. as the backends, and Q#/Qiskit/OpenQASM as the frontends. NWQ-Sim has been deployed on OLCF Summit, ALCF Theta and NERSC Perlmutter, scaling-out to more than a thousand of CPUs or GPUs. NWQ-Sim is currently used for noisy simulation of quantum chemistry, optimization, linear algebra, and communication applications. NWQ-Sim is supported by the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers, Quantum Science Center (QSC).
POC: Ang Li
NWQ-Sim CODE | SV-SIM cODE | dm-sIM cODE | qir-aLLIANCE cODE | pAPER 1 | pAPER 2
Tartan
Tartan is a multi-GPU benchmark suite used for evaluating modern GPU interconnect, such as NVLink, NVSwitch, PCIe, etc. It has three categories: a microbenchmark for measuring interconnect latency, throughput, communication efficiency, NUMA effect, etc. for both peer-to-peer and collective communications; a scale-up benchmark with seven applications for intra-node (i.e., single-node-multi-GPUs) evaluation; and a scale-out benchmark with seven applications for inter-node (i.e., multi-nodes) evaluation. Tartan was supported by U.S. DOE Office of Science, Office of Advanced Scientific Computing Research, under award 66150: "CENATE - Center for Advanced Architecture Evaluation".
POC: Ang Li
CODE | Paper 1 | Paper 2
SBNN
SBNN is a GPU-accelerated high-performance inference engine for binarized neural network. It has two version: “BSTC” that runs on GPU CUDA cores leveraging GPU’s native bit instructions for ultra-low-latency BNN inference, achieving 1000X single-image (from ImageNet) inference latency reduction over TensorFlow; “TCBNN” that runs on Turing and Ampere GPU tensor-cores, achieving high throughput. SBNN was supported by the DS-HPC project under PNNL’s Deep-Science LDRD Initiative.
POC: Ang Li
CODE | Paper 1 | Paper 2
ARENA
ARENA is an architecture and runtime prototype for the next generation reconfigurable HPC. ARENA is a non-von-Neumann architecture adopting an asynchronous, locality-driven, task-based execution model for reconfigurable platforms. ARENA brings computation to the data rather than the reverse, significantly reducing (unnecessary) data-movement and boost execution performance. ARENA has been demonstrated on multi-FPGA and multi-CGRA platforms. The ARENA runtime and architecture design is supported by the Compute-Flow-Architecture (CFA) project under PNNL’s Data-Model-Convergence (DMC) LDRD Initiative.
POC: Ang Li
CODE | Paper